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Preface 


Introduction 


This fully revised and updated edition of A Concise Course in Advanced Level Statistics is a 
comprehensive text for use primarily by students and teachers of Advanced Level 
Mathematics, both at AS and A2 level. It also provides a useful support for those studying 
statistics as part of science, social science and humanities courses. 


Features 


® Points of theory are explained concisely and illustrated clearly by worked examples, many 
taken from Advanced Level papers. 


® Carefully graded exercises help you to consolidate ideas and gain experience in applying 
theory to different situations. 

» Frequent hints pinpoint common misunderstandings and reinforce ideas. 

® Key concepts and formulae are highlighted in colour to increase clarity. Frequent 
summaries provide a quick reference. 

* Extensive miscellaneous exercises and end-of-chapter tests provide practice in tackling 
examination questions, providing essential examination preparation. 

® Answers to all exercises are provided. 

» An ICT supplement explores the use of ICT in the study of statistics. 


Specifications 


The text covers the main theory required in the specifications of all the examination boards 
for the statistics sections of AS and A2 Mathematics. 


Examination Questions 


We are grateful to the following Awarding Bodies for permission to reproduce questions from 
their past examinations: 


@ Assessment and Qualifications Alliance (AQA}, including Northern Examinations and 
Assessment Board (NEAB/JMB) and Associated Examining Board (AEB) 

® The Edexcel Foundation including University of London Examinations and Assessment 

Councils (L) 

Mathematics in Education and Industry (MEI) 

Oxford, Cambridge and RSA (OCR) including University of Cambridge Local 

Examinations Syndicate (C), Oxford & Cambridge Schools Examination Board (O & C) 

and Oxford Delegacy of Local Examinations (O) 

Welsh Joint Education Committee (WJEC) 


answers and worked solutions provided for examination questions are the responsibility of 
the authors. 


@ 


We hope that you will enjoy using this text and that it will enhance your understanding of 
Statistics and give you confidence to succeed. 

J Crawshaw & J Chambers 

2001 


Representation and summary of data 


In this chapter you will learn about 


discrete and continuous data 


@ 


stem and leaf diagrams (stemplots) 


@ 


s histograms, frequency polygons and the shape of a distribution 

e pie charts 

® means and weighted means 

e standard deviation and variance 

e cumulative frequency 

® medians, quartiles and inter-percentile ranges 

® skewness, including Pearson's coefficient and quartile coefficient 
s the shape of the normal distribution 


box-and-whisker diagrams (boxplots) and outliers 


® 


DISCRETE DATA 


In a survey of 1m quadrats in a field the number of snails in each of 30 quadrats was recorded 


as follows: 


Vida O23 he ee De She Dee Be Bee 
232 32 Oe be Pe 203233 


This is an example of discrete raw data. 
Discrete data can take only exact values, for example 


the number of cars passing a checkpoint in 30 minutes, 
the shoe sizes of children in a class, 
the number of tomatoes on each plant in a greenhouse. 


The data are known as raw because they have not been ordered in any way. 


2 A CON 


Frequency distribution for discrete data 


To illustrate the data more concisely, count the number of times each value occurs and 
summarise these in a table, known as a frequency distribution. 


Number of snails 0 1 2 3 4 5 


Frequency 3 5 11 8 2 1 Total 30 


The frequency distribution can be represented diagrammatically by a vertical line graph or a 


bar chart. The height of the line or bar represents the frequency. 
Vertical line graph to show Bar chart to show 
number of snails number of snails 
» 124 » 124 
210 2 104 
= 84 7 84 
6 64 
nt i 
2 ot | 
° o 12 3 4 5 OT T2345 


Number of snails Number of snails 


Notice that 


e in the vertical line graph the distinct lines reinforce the discrete nature of the variable, 
e inthe bar chart the bars are all the same width and they are labelled in the middle of the 
bar on the horizontal axis. 


The mode 


The mode is the value that occurs most often. 


The mode is the most popular value, deriving from the French ‘a la mode’ meaning 
fashionable. It is easy to see from the diagrams above that the mode is 2 snails per quadrat. 


CONTINUOUS DATA 


The following data were obtained in a survey of the heights of 20 children in a sports club. 
Each height was measured to the nearest centimetre. 


433-5 136.120 138°°133. 131, 127 141. 127. 143 
130° 131.-125°144.- 128°°134- 135° 137-133-129 


This is an example of continuous raw data. 


Bera 
2C8E 


Continuous data cannot take exact values but can be given only within a specified range or 


measured to a specified degree of accuracy. 


For example, the measurement 144 cm (given to the nearest cm) could have arisen from any 
value in the interval 143.5 cm <h < 144.5 cm. 


Other examples of continuous data are 


the speed of a vehicle as it passes a checkpoint, 
the mass of a cooking apple, 
the time taken by a volunteer to perform a task. 


Frequency distribution for continuous data 


To form a frequency distribution of the heights of the 20 children, group the information into 
classes or intervals. Here are three different ways of writing the same set of intervals. 


Height (cm) Height (cm) Height (to the 
119.5<h< 124.5 119.5-124.5 cect 
124.5<h < 129.5 424,5-129.5 120-124 
129.5 <h < 134.5 129.5-134.5 125-129 
134.5 <b < 139.5 134.5-139.5 130-134 
139.5<h < 144.5 139.5-144.5 135-139 


140-144 


The values 119.5, 124.5, 129.5, ... are called the class boundaries or the interval boundaries. 
The upper class boundary (u.c.b.) of one interval is the lower class boundary (I.c.b.) of the 
next interval. 


Width of an interval 


The width of an interval is the difference between the boundaries. 
Width of an interval = upper class boundary — lower class boundary 


Often intervals with equal widths are chosen, as in the above illustrations in which each width 
is S cm. 


To group the heights it helps to use a tally column, entering the numbers in the first row 
133, 136, 120, ... etc. and then the second row. It is a good idea to cross off each number in 
the list as you enter it. The frequency distribution for the above data should read: 


Height (cm) Tally Frequency. 
119.5 <b < 124.5 | i 
124.5 <b <129.5 Ht 8 
129.5 <h < 134.5 HAT IL Z 
134.5 <b < 139.5 UL 4 
139.5 <h< 144.5 i 3 


It is important to note that when the data are presented only in the form of a grouped 
frequency distribution, the original information has been lost. For example you would know 
that there was one item in the first interval, but you would not know what it was. You would 
know only that it was between 119.5 cm and 124.5 cm. 


STEM AND LEAF DIAGRAMS (STEMPLOTS) 


A very useful way of grouping data into classes while still retaining the original data is to 
draw a stem and leaf diagram, also known as a stemplot. 


These are the marks of 20 students in an assignment: 


84°17. 38.45 47.93.76. 54. 75° 22 
66-65 55 54 51 44.39.19 S472 


Notice that the lowest mark is 17 and the highest mark is 84. 
In stem and leaf diagrams, all the intervals must be of equal width, so it seems sensible to 
choose intervals 10-19, 20-29, 30-39, ..., 80-89 for this data. 


Take the stem to represent the tens and the leaf to represent the units. 


The first five entries When all the numbers 
84, 17, 38, 45 and 47 have been entered the 
are represented like this: diagram looks like this: 
Stem | Leaf 
(tens) | (units) Stem | Leaf 
1| 7 1 79 
2 2 2 
3] 8 3 839 
4 SF 4 57 
5 5 345414 
6 6 65 
ee 7 652 
8 4 8 4 


The entries in each leaf are now arranged in numerical order and a key is given to explain the 


stem and leaf. The final diagram looks like this: 


Stem and leaf diagram to show assignment marks 


wn 
3 
B 
ay 
e 
iS] 
a 


Key 1|7 means 17 marks 


SBYIDNBWNE 
BNURUWN 
Mn Wr 0 
> 
a 
aN 
wn 


The stemplot gives a good idea at a glance of the shape of the distribution. It is easy to pick 
out the smallest and largest values and to see that the mode is 54. It is also obvious that the 
modal class is 50-59. 


Example 1.1 


The maximum temperature in °C, measured to the nearest degree, was recorded each day 
during June in Sutton with the following results: 


19222322 192.192 202.12), 19: 22 22 POE 18 16.19.2002 17. 
13°14 12 15 17 16 17 19 22 22 20 19 19.20°°20 


Draw a stem and leaf diagram to illustrate the temperatures and write down the moda 
temperature. 


a 


Solution 1.1 


The smallest value is 12 and the highest value is 23. Grouping the data into intervals 
10-19, 20-29, ... would give you very little information. 


Choose a sensible number of intervals; usually between 5 and 10. Since you must use intervals 
Saar width, you could use intervals of 2 °C and consider 12-13, 14-15, 16-17, 18-19. 
-21, 22-23. ; 


First do a preliminary plot and then arrange the entries in each leaf in order. 


Preliminary plot: Final diagram: 

Stem | Leaf Stem | Leaf 
1/1232 11223 Key:1.| 2: means,12,°C 
1 45 1 45 
1 667767 1 666777 
Al 9-9 9.:9--8::9°9°9°9. 1 89.9: 90919999) 
2; 00000 2/ 00000 
2y 13 22 D2 Ze DE 22263 

The modal temperature is 19 °C. Stemplot to show maximum temperatures 


NOTE: The stem does not necessarily represent the tens digit. For example, suppose you 
want to use intervals 12-14, 15-17, 18-20, 21-23. The interval 18~20 cannot be represented 
by a stem of 1, since the tens digit changes during the interval. For the stem you can use 12 
15, 18 and 21. The leaf is then given as the number that is added to the stem. ; 


ar Aad : Key 15|2 means 17°C 
18 [0 a 

15 0-11412.2 2 Omens et 

18 01114111122222 

21 11112 


NOTE: The key is essential in explaining how the stemplot has been formed. 


in a stem and leaf diagram, or stemplot 
(a) equal intervals mst be chosen, 


(b) a key is essential. 


Example 1.2 
The table gives the number of days on which rain fell in 36 consecutive intervals of 30 days. 


O91 19 6 12 8 18 9 8 11 17 15. 13 
16 9 17:18 9 24 17 7 8 17 17 8 
7°41 16 17 8 5 13 22 20 16 20 13 


Draw stem and leaf diagrams with the following class intervals: 
(a) 5-9, 10-14, 15-19, 20-24 
(b) 4-6, 7-9, 10-12, 13-15, 16-18, 19-21, 22-24. 


Solution 1.2 
(a) Using intervals 5-9, 10-14, 15-19, 20-24 the completed stem and leaf diagram is: 


Stem , Leaf 
0)567788888999 
11112333 
1|5666777777889 
2/00124 


NOTE: The stem and leaf diagram could have been written differently, as follows: 


Stem , Leaf 

5 012233333444 Key 15|1 means 16 
10} 112333 5 [3 means 8 
AS O1L1L1L222 2223 34 

20 00124 


(b) Using intervals 4-6, 7-9, 10-12, ... the completed diagram, arranged in order is: 


Stem Leaf 

al Le Key 13|2 means 15 
7)0011111222 | 
1o | 112 

13 | 0002 

1 | 00011111122 

19 | 0112 

22 | 02 


Both diagrams show that the mode is 17 rainy days, but the seven intervals used in (b) 
show more clearly the two peaks, illustrating that the distribution is approximately 
bi-modal, with modal classes 7-9 and 16-18. 


Example 1.3 


Look at this stem and leaf diagram and for each of the three keys provided, give 
{a) the value ringed, 
(b) the width of the interval containing the ringed value. 
Stem | Leaf (i) the widths of 30 metal components 
017 
ma) Key.1.|2-means 1.2 cm 
1 01 : 
Be |, 22, (ii) the reaction times of 30 volunteers 
1144455 
1|/66777 Key 1|2 means 12 hundredths of a second 
1} 8888990 
2/0011 (iii) the attendance at 30 matches 
24 2:3 
214 Key 1| 2 means 1200 people 


Solution 1.3 


(i) (a) 1|9 means 1.9 cm. 
(b) The interval is 1.8 cm-1.9 cm. Since width is a continuous variable, and assuming 
that widths have been measured to the nearest tenth of a centimetre, then 
1.75 cm < width < 1.95 cm and the class width is 2 mm. 
(ii) (a) 1|9 means 19 hundredths of a second, i.e. 0.19 seconds. 
) The interval is 0.18 sec—0.19 sec, i.e. 0.175 < time < 0.195, so the class width is 0.02 
seconds, 
(iii) (a) 1]9 means 1900 people. 
(b) The interval is 1800 people-1900 people. Assuming that the number has been given 
to the nearest hundred, then 1750 < number < 1950, so the class width is 200 people. 


Back-to-back stemplots 


Stem and leaf diagrams can be used to compare two samples by showing the results together 
on a back-to-back stemplot. 


Example 1.4 


Use a stem and leaf diagram to compare the examination marks in French and English for a 
class of 20 pupils. 


French... 75-69: $8 5846.44.32. 50.53. 78 
81. 61 61.45. 31.44.53. 66. 47.57 


English 52-.58:.68° 77 38 85 43. 44 36 65 
65°79 44-71 84-7263. 69.72.79 


Solution 1.4 


mv 


The first four entries for French (75, 69, 58, 58 


into a back-to-back stemplot as follows: 


Key (Bench) French 
9 | 6 means 69 | 
8 8 5 
9 6 
5 7 
8 
The completed diagram, before rearranging, is: 
French 
12:-1-3 
74546|4 
733088 5 
6119 6 
85 wh 
1 | 8 
The final diagram, arranged in order: 
French 
21 |3 
76544 |4 
887330 1/5 
9611 )|6 
85 17 
118 


and for English (52, 58, 68, 77) are entered 


English Key (English) 


5|2 means'52 

28 

8 

7 

English 

8 

344 
286 
85539 
791229 
54 
English 

8 

344 
268 
35589 
1252 °7°9°9 
45 


From the diagram it is clear that the class had higher marks in English than in French and it 
appears that they performed better in English. This would, however, depend on the standards 
of marking used in the two examinations. 


xercise la Stemplots 


1. (a) Draw a stemplot to show the masses, correct 
to the nearest kilogram, of 30 men. 
Use intervals 50-54, 55-59, 60-64, ... 

(b) Write down the modal mass. 

74 §2 67 68 71 76 86 81 73 
68 64 75 71 61 63 S7 67 57 
59 72 79 64 70 74 77 79 65 
68 76 83 


2. A teacher recorded the times taken by 20 boys to 
swim one length of the pool. 


The times are given to the nearest second. 


Using intervals 24-25, 26-27, ...5 draw a stem 
and leaf diagram to illustrate the results. 

32 31 26 27 27 32 29 26 25 25 
29 31 32 26 30 24 32 27 26 31 


3. A group of adults took part in an experiment 
which measured their reaction times. The results 
were given to the nearest hundredth of a second. 
0.14 0.17 0.21 0.20 0.20 0.22 
0.14 0.24 0.26 0.17 0.14 0.17 
0.21 0.20 0.22 0.14 0.24 0.26 
0.17 0.18 0.17 0.21 0.20 0.23 
0.17 0.23 0.21 0.23 0.24 0.23 
Use intervals 0.14—0.15, 0.16-0.17, 
0.18-0.19, ... to draw a stemplot to illustrate the 
results, Comment on your diagram. 


4, Ina lesson on measurement, 30 pupils estimated 
the length of a line in centimetres and wrote 
down their value correct to the nearest mm. 
Using intervals 3.0-3.9, 4.0-4.9, ..., draw a 
stemplot. 


92 73 70 65 SA S53 10.1 84 
8.8 7.1 76 7.9 67 96 S55 74 
7.0 82 S55 7.8 82 75 61 61 
3.9 68 7.6 8.1 8.0 10.0 


5. The daily hours of sunshine in London during 
August were 


70 7.6 12.5 12.9 83 97 84 11.1 
75 75 98 104 116 113 73 7.8 
68 62 61 56 S6 58 48 43 
0.0 06 O8 16 02 24 26 


Illustrate these data on a stem and leaf diagram 
and comment. 


6. A stemplot is given below but it does not have a 
key. 


Stem Leaf 
5 9 
6 14 
6 789 
71 233@ 
7 | 566678 
8 034 
8 5 


State the value ringed and the width of the 

interval that it is in when the diagram illustrates 

{a) the times taken for a journey, where 6| 8 
represents 6.8 hours, 

(b) the masses, in g to three decimal places, of 
components, where 6 | 8 represents 0.068 g. 


7. Draw back-to-back stemplots for the following 
data. What conclusions can you draw? 


(a) The pulse rates of 30 company directors 
were measured before and after taking 
exercise. 

Before: 110, 93, 81, 75, 73, 73, 48, 53, 69, 
69, 66, 111, 105, 93, 90, 50, 57, 64, 90, 
111, 91, 70, 70, 51, 79, 93, 105, 51, 66, 93. 
After: 117, 81, 77, 108, 130, 69, 77, 84, 84, 
86, 95, 125, 96, 104, 104, 137, 143, 70, 80, 
131, 145, 106, 130, 109, 137, 75, 104, 75, 
97, 80. 

(Use class intervals 40-49, 50-59, 60-69, ...) 


(b) The ages of teachers in two schools: 

School A: 51, 45, 33, 37, 37, 27, 28, 54, 54, 
61, 34, 31, 39, 23, 53, 59, 40, 46, 48, 48, 
39, 33, 25, 31, 48, 40, 53, $1, 46, 45, 45, 
48, 39, 29, 23, 37. 

School B: 59, 56, 40, 43, 46, 38, 29, 52, 54, 
34, 23, 41, 42, 52, 50, 58, 60, 45, 45, 56, 
59, 49, 44, 36, 38, 25, 56, 36, 42, 47, 50, 
54, 59, 47, 58, 57. 

(Use class intervals 20-29, 30-39, 40-49, ...) 


{c) 20 boys and 20 girls took part in a reaction- 
timing experiment. Their results were 
measured to the nearest hundredth of a 
second. 

Girls: 0.22, 0.21, 0.18, 0.18, 0.16, 0.19, 
0.25, 0.22, 0.17, 0.19, 0.16, 0.21, 0.24, 
0.22, 0.19, 0.22, 0.25, 017, 0.22. 
Boys: 0.14, 0.20, 0. 19, 0.16, 
0.15, 0.23, 0.23, 0.1 E 

0.23, 0.11, 0.21, 0.2: i 
(Use class intervals 0.0 
0.12-0.13, ...) 


0.22. 
2, 0.16, 
9, 0.16 
2, 0. 
g— 


WAYS OF GROUPING DATA 


The following frequency distributions show some of the ways that data can be grouped. The 
information is more concise than the raw data, but the disadvantage is that the original 
information has been lost. 


(i) Frequency distribution to show the lengths, to the nearest millimetre, of 30 rods 


Length (mm) 27=31 32=36 37-46 47-51 


Frequency 4 it 12 3 


The interval 27-31 means 26.5 mm < length < 31.5 mm. 
The class boundaries are 26.5, 31.5, 36.5, 46.5, S15 
The class widths are 5, 5, 10, 5 


RY OF DATA 11 


(ii) Frequency distribution to show the marks in a test of 100 students (vi) Frequency distribution to show ages (in completed years) of applicants for a teaching post 
Mark 30-39 40-49 50-59. 60-69 70-79. -80-99 Age (years) 21-24. 25-28 29232 33-40 41-52 
Frequency 0 14 26 20 18 42 Frequency. 4 2 2 1 1 
This distribution can be interpreted in two ways: Since the ages are given in completed years (not to the nearest year) then ‘21-24’ means 
21 < age < 25. Someone who is 24 years and 11 months would come into this category : 
(a) As discrete data, the interval 30-39 represents 30 < mark < 40. Sonietiinies this interval 48 written °21<".and th sity eo - / 
The class boundaries are 30, 40, 50, 60, 70, 80, 100 atta and the next is “25-", etc. | 
The class widths are 10, 10, 10, 10, 10, 20 The class boundaries are 21, 25, 29, 33, 41, 953 
(b) As continuous data, assuming marks are to the nearest integer, 30-39 would ; 
represent 29.5 < mark < 39.5. The class widths are 4, 4, 4, 8 12 
The class boundaries are 29.5, 39.5, 49.5, 59.5, 69.5, 79.5, 99.5 
The class widths are io, 10, 10, 10, 10, 20 
HISTOGRAMS 


(iii) Frequency distribution to show the lengths of 50 telephone calls 


Tength of call (ain) BE Re 6 Ge ei ig. Grouped data can be displayed in a histogram as in the following diagram. 


Frequency. 9 12 15 10. 4 0 


The interval ‘3~’ means 3 minutes < time < 6 minutes, so any time including 3 minutes 
and up to (but not including) 6 minutes comes into this interval. 


The class boundaries are 0, 3, 6, 9, 12, 18 


3.0 4 


Frequency density 


The class widths are By, 3h By 3, 6 J : 
Fe] represents | 
i (iv) Frequency distribution to show the masses of 40 packages brought to a particular counter | ; one, Passenger 


at a post office 
2.04 


Mass (g) =100 -250 ~500 =800 4 


36: 


Frequency: 8 10 16 6 


The interval ‘“-250° means 100 g < mass < 250 g, so any mass over 100 grams up to and : i 
including 250 grams comes into this interval. : 1.0 


| The class boundaries are 0, 100, 250, 500, 800 : ; onbg 
‘The class widths are 100, 150, 250, 300 | 

(v) Frequency distribution to show the speeds of 50 cars passing a checkpoint : Eee Ses 
0 4 : i a 


10 20 30 40 50 60 70 80 90 100 
Age of passengers 


4g 


Speed (km/h) 20-30 30-40 40-60 60-80 80-100 


Frequency. 2: 7 20 16 5 


Bae histogram represents the following table for the distribution of ages of passengers on a 
shuttle flight from Denver, Colorado to Salt Lake City, Utah. 


The interval 30-40 means 30 kin/h < speed < 40 km/h. 


The class boundaries are 20, 30, 40, 60, 80, 100 : Age, x years 0<x<20 20<x<40 40 <x < 50 50<x<70  70<x<100 


The class widths are 10, 10, 20, 20, 20 _ Frequency 4 44 36 28 6 | 


Histograms resemble bar charts, but there are two important differences. 


Inakh am 


there are no gaps between the bars, 
the of each bar is proportional to the frequency that it repre 


& 


ts. This means that 


total frequency. 


total a 


Histograms often have bars of varying widths, so the height of the bar must be adjusted in 
accordance with the width of the bar. 


The vertical axis is not labelled frequency but frequency density where 


freq 


¢ : onCy 
frequency density = =~ eee 
interval width 


Consider the interval 20 < x < 40 in the frequency table above. 
Frequency = 44, interval width = 20, so frequency density = j= 2.2 


The complete table looks like this: 


Ages Interval width Frequency. Frequency density 
0<x%<20 20 4 0.2 
20-<x< 40 20 44 2.2 
40.<x< 50 10 36 3.6 
S0:< x < 70 20 28 1.4 
70 <x < 100 30 6 0.2 


Modal class 


The highest bar in the histogram represents the interval 40 < x < 50. This is the modal class. 
Notice that in the table this interval does not have the greatest frequency, but it does have the 
greatest frequency density. 


In a grouped frequency distribution, the modal class is the in 
density, i.c. the interval represented by the highest bar in the hist 


Example 1.5 


The grouped frequency distribution records the masses, to the nearest gram, of 84 letters 
delivered by the postman. 


Mass (g) 1-20 21-40 41-60 61-80 81-100 


Number:of letters 10 18 24 14 18 


Draw a histogram to illustrate these data. 


Solution 1.5 


The data are continuous. 
The class boundaries are 0.5, 20.5, 40.5, 60.5, 80.5, 100.5 
The interval widths are 20, 20, 20, 20, 20 


In this example all the intervals are of equal width and you could use the frequency for the 
height of the bar. It is, however, a good idea to use the frequency density for the height of the 
bar. The resulting histogram will then have a total area which represents the total frequency. 


Mass (g) Interval width Frequency. Frequency density 
0:5<x <20.5 20 10 0.5 
20.5 <x < 40.5 20 18 0.9. 
40.5 <x < 60.5 20 24 1.2 
60.5 <x < 80.5 20 14 0.7 
80.5 <x < 100.5 20 18 0.9 


Histogram to show the masses of letters 


2 

3 

5 fees | represents 

a 1 letter 

2 

g 

3 

x 

24 
18 bs. 
44. 
0.5 20.5 40.5 60.5 80.5 100.5 


Mass of letter (g) 


The main purpose of histograms is to illustrate grouped continuous data, but they can also be 
used to illustrate grouped discrete data. 


Example 1.6 


These are the examination marks for a group of 120 first year statistics students. 


Mark 0-9 10-19. 20-29 30-49 50-79 


Frequency 8 21 53 28 10 


Represe ina hi 
epresent the data in a histogram and comment on the shape of the distribution. 


14 & CON! COURSE IN A-LEVEL STATISTICS 
Solution 1.6 Finding the frequencies from a histogram 
The data are discrete, so, to avoid gaps in the histogram, use class boundaries 9.5, 19.5, 29.5, To find the frequency in each interval, use 
49.5. This leads to -0.5 and 79.5 as the remaining two boundaries, even though these marks 
are outside the range of the discrete data. frequency = interval width x frequency density 
The class boundaries are -0.5, 9.5, 19.5, 29.5, 49.5, 79.5 : 
The interval widths are 10, 10, 10, 20, 30 Example 1.7 
A Passengers’ Association conducted a survey on the punctuality of trains using a particular 
Class : Frequency station. The histogram illustrates the results. 
Mark Width Frequency ~ Density 
5 (a) Construct the frequency distribution. 
0-9 10 3 O48 (b) How many trains were there in th ? 
oe ob ~ a y ere in the survey? 
20-29 10 53 5.3 Histogram to show lateness of trains 
| 30-49 20 28 1.4 
t 50-79 30 10 0.3 2 
: 7 
i 8 
Histogram to show examination marks s 
5 
> Ef 
id ira 
& 5 
= 1 
5 
g 4 
| a 
! 34 
| 58 
24 
| 
| L 21 ill 
: : 28 : 70 80 
10 lL "Number of minutes late (t) 
-0.5 9.5 19.5 29.5 39.5 49.5 59.5 69.5 79.5 
| Marks 
Solution 1.7 
The distribution has a long tail of values to the right. It is said to be positively skewed. 


(a) To find the frequency in each interval, use frequency = interval width x frequency density 
HINT: when drawing the histogram you will find it easier to mark out the horizontal axis 


| -0.5, 9.5, 19.5, ... using the lines of your squared paper. Then draw in the vertical frequency Number of (é) 
| density axis in a suitable position. Anywhere wil] do for this; it does not have to go through minutes late O<t<5§ 5<#< 10 10<¢€20-° 20<t¢< 30-30 << 50) 50 <4 < 80 
0, 0), but could be to the left of —0.5, for example 
(0, 0), but ¢ > P. Frequency 56.4 5x 8.8 10« 2.8 10x 1.2 20 x 0.6 30x 0.2 
=32 =44 =28 212 =2 26 


(b} Number of trains = 32 +44 +284+124124+6=134 


a ni a a a 


0.5 9.5 19.5 


Example 1.8 


The number of letters delivered to the houses in Distribution Street is illustrated in the 
histogram. Given that 13 houses received three or four letters, how many houses are there in 
the street? Explain the scale on the vertical axis. 


Frequency density 


Number of letters delivered 


Solution 1.8 


The scale on the frequency density axis has not been marked but since you are given that there 
are 13 houses in the interval 3-4 it is easy to see the area of four small squares represents one 
house. 

represents 1 house - 
The frequencies can be deduced directly from this, for example, the interval 7-10 contains 
two houses. 


Total frequency = 5 +13+10+2=30 
There are 30 houses in the street. 


To work out the scale on the frequency density axis, note that the interval 3-4 has frequency 
13 and is of width 2, therefore frequency density = 13 + 2 =6.5. 


Since the bar is 13 squares high, each square on the vertical axis represents a frequency 
density of 0.5. 


Although it is easier to use frequency density for the vertical scale in the histogram, other 
scales can be used, provided that area is proportional to frequency. This is illustrated in the 
following example. 


Example 1.9 


A teacher recorded the time, to the nearest minute, spent reading during a particular day by 
each child in a group. The times were summarised in a grouped frequency distribution and 
represented by a histogram. The first class in the grouped frequency distribution was 10-19 
and its associated frequency was eight children. On the histogram the height of the rectangle 
representing the class was 2.4 cm and the width was 2 cm. The total area under the histogram 
was 93.4 cm?. 


Find the number of children in the group. (L) 


Solution 1.9 
Rectangle representing 10-19 interval: 


Area of rectangle = 2 x 2.4 
24cm = 4,8 cm? 


Area e frequency 
Area =k x frequency 
4.8 =kx8 
k=0.6 
Total area =k x total frequency 
53.4 =0.6 x total frequency 


53.4 
Total frequency = Oe = 89 


There were 89 children in the group. 


FREQUENCY POLYGONS 


A grouped frequency distribution can be displayed as a frequency polygon. 


To construct a frequency polygon, for each interval plot frequency density against the 
mid-interval value, where 


mid-interval value = } (lower class boundary + upper class boundary) 


Then join the points with straight lines. 


Example 1.10 


Draw a frequency polygon to illustrate this frequency distribution which gives the times taken 
by 31 competitors to complete a cross-country run. 


Time ¢ (min) 25.<t<30 30:<t<35 35:<t<40 40 <t< 50 50:€t.<'65: 


Frequency 4 12. 8 4 3 


Solution 1.10 


Mid-interval Frequency. 
‘Time value Interval: width Frequency density 
25.<¢<30 27.5 5 4 $20.8 
30.<t<35 32:5 Ss 12 Bo24 
35<2<40 37.5 5 8 $- 1.6 
40<7¢< 50 45 10 4 204 
90<£<65 57.5 19 3 5202 


18 Ac 


Frequency polygon to show times taken to complete a cross-country run 


2 
2 
a 
3 3] 
a 
3 
5 
S 
& 
2. 
\ 


25 30 35 40 45 50 55 60 65 
Time (min) 


Note that this distribution is skewed with a tail at the right hand end, ie. it is positively 
skewed. 


You could of course construct the histogram first and then join the mid-points of the tops of 
the rectangles to give the frequency polygon. 


Comparative frequency polygons 


Frequency polygons are very useful when comparing sets of data. 


Example 1.11 


Draw frequency polygons to compare the age distribution of the teachers in two sixth form 


colleges: 
Age 20= 25= 30— 35= 40- 4$= 50= S5S5= 60= 65= 
College A 4 6. 1 14 9 5. S 3 0 9 
College B 0. 2 4 7 it 12 it 8 3 0 


Solution 1.11 


Work out the mid-interval value for each interval, for example in the interval ‘20—’ the lower 
boundary is 20 and the upper boundary is 25, so mid-interval value = (20 + 25) =22.5 


The width of each interval is 5, so work out the frequency densities for each college by 
dividing the frequencies by S. 


NO SUMMARY OF DATA 19 
Frequency density Frequency density 
Mid-interval value Eollege A College B 
22.5 0.8 0. 
27.5 1.2, 0.4 ‘ 
32.55% 2.2 0.8 
37.5 2.8 1.4 
42.5 1.8 2.2. 
ATS 1 2.4 
52.5 1 : 2.2 
S75 0.6 1.6 
62.5 0 1 
67.5 0 0. 
2 3) 
@ College A 
3 7 co sem + College B 
s 4 
3 
g 4 
fra 
4 
a4 
| 
i4 
4 4 
o++€ " 7 
20 25 30 35 40 45 70 
Age (years) 


The bulk of the distribution for College A is further to the left than College B. This indicates 
that College A has a much younger staff than College B. 


Notice that in this example, since all the intervals are of equal width, frequency could have 
been used on the vertical axis. 


FREQUENCY CURVES 


When the number of intervals is large the frequency polygon 
consists of a large number of line segments. The frequency 


pol : 
Polygon approaches a smooth curve, known as a frequency 
curve. 


The shape of a distribution 


If distributions represented by a vertical line graph or a histogram are illustrated using a 
frequency curve, it is easier to see the general ‘shape’ of the distribution. For example: 


A positively skewed distribution could occur when considering, for example, 


@ the number of children in a family, 
e the age at which women marry, 
e@ the distribution of wages in a firm. 


(a) Positive skew 


In a positively skewed distribution, there is a long tail at the positive end of the 
distribution. 


(b) Negative skew 
A negatively skewed distribution could occur when considering, for example, 


e reaction times for an experiment, 
e daily maximum temperatures for a month in the summer. 


In a negatively skewed distribution, there is a long tail at the negative end of the 
distribution. 


(c) Reverse J-shape 


i 
} 


eoanee 


Ina J-shaped (reverse) distribution an initial ‘bulge’ is followed by a long tail. 


| 
r 
| 


(d) Uniform or rectangular 


| 


In a uniform or rectangular distribution the data are evenly spread throughout the range. 


(e) The normal distribution 


= 


‘This symmetrical, bell-shaped distribution is known as a normal distribution. 


An approximately normal distribution occurs when measuring quantities such as heights, 
masses, examination marks. 


ams and frequency polygons 


1. A researcher timed how long it took for each of 3, Ona particular day the length of stay of each car 
38 volunteers to perform a simple task. The at a city car park was recorded: 
results are shown in the table. 


Length of stay. (min) Frequenicy, 

Time (seconds): 5" -10= 20= 25- 40-.. 45- = 
t<25 62 
Frequency Seca Yay S287 2. 0 25<¢<60 70: 
j 60<t<80 88 

Di hi i 

raw a histogram to illustrate the data. 80<t<150 280 
2. Ina survey the masses of 50 apples were noted 150 <¢< 300 30 


and recorded in the following table. Each value 
was given to the nearest gram, 


86 101 114 118 87 92 93 116 
1 


96 117 4, Draw a histogram to show the masses, measured 
100 106 118 101 107 96 101 102 ‘ r sie i : 
(04-93. “86: 107° 9e-d0s 119.400 to the nearest kilogram, of 200 girls. 


Represent the data by a histogram and state the 
modal class. 


(03 108 92 109 95 100 103 110 


113) 99 106 116 101 105 86 88 
108 92 


(a) Construct a frequency distribution, using 


Mass (cg) 41-50. 51=55.°56-60. 64+70.°71-75 


Frequenc 21 62. 5S. 30 412 
y. 


equal class intervals of width 5 g and taking 
the first interval as 85-89. 

{b) Draw a histogram to illustrate the data and 
write down the modal class. 

{c} Draw a stemplot to illustrate the data and 
write down the mode. 


Frequency density 


This histogram represents the speeds of cars 
passing a 30 miles per hour sign. Write out the 
frequency distribution. 
94 
34 
mi 
64 
54 
44 
34 
a4 
1 
Le 
0 2024 30h 38 48 60 
32 Speed (mph) 
. Ina competition to grow the tallest hollyhock, 
the heights recorded by 50 primary school 
children were as follows. Heights were measured 
to the nearest centimetre. 
‘Height (cm) Frequency 
177-186 : 12 
187-191 8 
192-196 8 
197-201 9. 
202-206 7. 
207-216 6 


Draw a histogram and superimpose a frequency 
polygon. 


The table shows the duration, in minutes, of 
64 telephone calls made from a High Street call 
box ina day. 


Length of call (min) Frequency. 

0- 3 

oie 7 

3 22: 

6- 20 

12- 6 

1S= 6: 

21- 0 


Draw a frequency polygon to illustrate the data. 


These are the number of times the fetter ‘e” 
appears in each sentence in an article called “My 
Kind of Day’. Make a grouped frequency 
distribution and draw a histogram. 


15 12 8 12 3 10 1417 5 38 if 
7165 13 12 11 6 74178 21 


12. 


9. The table shows the ages, in completed years, of 
women who gave birth to a child at Anytown 
Maternity Hospital during a particular year. 
Without drawing a histogram first, draw a 
frequency polygon to illustrate the information. 
Describe the distribution. 


Age (years) Number of births 
16- 70 
20- 470 
25- 535 
30- 280 
35- 118 
4S- 0 


40. The patients at a chest clinic were asked to keep 


a record of the number of cigarettes they smoked 
each day. 


Number of cigarettes 
smoked per day Frequency 

0-9 5 

10-14 8 

15=19 32 

20-29 41 
30-39. 16 

40 and over 2 


Draw a histogram to represent this data. 


11. The marks awarded to 136 students in an 
examination are summarised in the table. Draw a 


histogram to illustrate the data. 


Marks Frequency 
10-29 22 
30-39 18 
40-49 22. 
50-59 24 
60-64 14 
65-69 12 
70-84 


s 2 i 
ae ON ow 


Frequency density 


° 


i 
| 
2 

Length (cm) 
Complete the frequency distribution represented 
by the frequency polygon above. 


Length (cm) Frequency 


O<x<4 2 
4<x<8 

8<x<12 

12<x<16 

16<x<18 

18 <x <20 

20 <x <30. 


13. Lucy and Jack play a computer game every day 
and keep a record of their scores. Lucy’s scores 
are shown in the table. Draw a frequency 
polygon to represent her scores. 


i 
1 
B 10 12 14 16 18 20 22 24 26 28 30 


Lucy’s 
scores 50-99. 100-149. 150-199 200-249. 250-299 


Frequency: 6 14 10. 6 4 


Jack’s scores are as follows: 


Jack’s 
scores. 50-99: 100-149. 150-199: 200-249: 250-299 


Frequency 2 6 10: 16 6 


Draw a frequency polygon for Jack’s scores on 
the same set of axes as Lucy’s and use it to 
compare the two sets of scores. 


14. Students were investigating the effects of a 
growth hormone placed on the growing tip of a 
maize seedling. The hormone was used in two 
different concentrations and distilled water was 


used as a control on a third set of seedlings. After 


three weeks the heights of the plants were 
measured to the nearest centimetre. They are 


shown in the table. Draw frequency polygons to 
represent the data and compare the results. 
Control 
Height (cm) Frequency 

45 0. 

46 7 

47 rr 

48 12 

49 14 

SO. 14 

51. 18 

52 42: 

53. 8 

54 3 

5S 1 

56 0. 


20% solution 


Height (cm) Frequency: 
50: 0 
S1 1 
$2 0 
53 2: 
54 5 
5S. 3. 
56 17 
57 25. 
58 20: 
59 12 
60 9 
61 0 
40% solution 
Height (cm) Frequency: 
54 0 
55 2 
56 2 
57 2. 
58 Ran 
59 10 
60 11. 
61 18 
62. 18: 
63. 16 
64 9. 
65. 2 
66 0 


15. In one month, a stiadent recorded the length, to 
the nearest minute, of each of the lectures she 
attended. The table below shows her data and 
the calculations she made before drawing a 
histogram to illustrate these data. 


Length of 
lecture (minutes) ‘50-53. 54-55..56=59 60-67 


Number of: 
lectures @ b 30 ¢ 


Frequency: 
density. 3 13 BS: 1s 


Calculate 

(a) the value of a, of 6 and of c, 

(b) the total number of lectures attended during 
the month, (C Additional) 


ii 


24 4 


CIRCULAR DIAGRAMS OR PIE CHARTS 


Pie charts are so called because they look like an apple pie! The areas of the slices or sectors of 
the pie are in proportion to the quantities being represented. 


Example 1.12 


The pie chart, which is not drawn to scale, shows the 
distribution of various types of land and water in a certain 
county. Calculate 


: Woodland 
\. 160" 


Z Farmland 
1200 km? 


(a) the area of woodland, 
(b) the angle of the urban sector, 
(c) the total area of the county. (C) 


Solution 1.12 


(a) 160° represents 1200 km?, 
Area of woodland = 660 km? 


88° represents 129° x 88 = 660 km? 


(b) 1200 km? is represented by 160°, 30 km? is represented by ip x 30 = 4° 
Angle for the urban sector = 4° 


(c) 160° represents 1200 km?, 360° represents 1200 x 360 = 2700 km? 
Total area of county = 2700 km? 


Comparison pie charts 


Pie charts of different sizes are useful when comparing two or more populations. The area of 
each pie will be in proportion to the different population sizes, so if the pies are drawn with 
radii r, and r, and represent total population sizes F, and F,, then 


ger’: sory = Fy: B, 
rh: r= FF, 
ryit,= VEAP, 

tr, VF 


Radii should be chosen so that — = 


Dividing by x 
Taking square roots 


Example 1.13 


The table shows, in millions of pounds, the sales of a company in two successive years. 


‘Year Africa America Asia Europe 
First SiS 6.7 13.2 19.6 
Second. 5.8 45.2 9.2. 29.8 


Draw two pie charts which allow the total annual sales to be compared. (C Additiona ) 


Solution 1.13 


First calculate the total sales for each year and the angles in the pie charts. 


Total sales (in millions of pounds): 


F,=5.5+6.7 + 13.2 +19.6=45 
F,=5.84+15.2 +9.2+29.8 = 60 


First year 


Second year 


Angles: 
Africa America Asia Europe 
5.5. 6.7: 9.2; 13:2. 
i 3 a 69e 2 44e 2 x 360° = 53.68 = o.. 55,22 2 405.6" a 
First year 48 x 45 xX 360° = 53.6 a0 x 360°.4 55.2) 45 X 360° = 105.6° Total 360 
29.8 19.6 


5.8 15,2: 5 
Second year %0. x 360° = 34.8° 60. x 360° = 91.2° 60. x 360° = 178.8° 


5 x 360° = 156.8° Total 360° 


Work out the ratio of the radii using 
nina, :F,=45:60=3:4 
ryt ty = VB: V4= 1.73 12 


So you could take r; = 1.7 cm, 7, = 2 cm, or multiples of these e.g. 7, = 3.4 cm, 7, = 4 cm. 


Sales in second year 


Sales in first year 


Za 


Example 1.14 


On a particular Wednesday the sales of sugar from a supermarket consisted of 250 large 


small packets. 
The radi ing pi 
ne radius of the corresponding pie chart for the following Saturday’s sales of sugar was 


packets were sold. Calculate the number of small packets sold on the Saturday. 


oe 210 medium packets and 225 small packets. The mass of sugar in a large packet is 
J ae that ina medium packet and 24 times that in a small packet. Calculate the angles 
ceded to draw a pie chart representing the total masses of sugar sold in large, medium and 


d : 
Ouble that for the Wednesday’s sales. On the Saturday 900 large packets and 900 medium 


(C) 


Solution 1.14 


Let the mass of a small packet be x. 
Then the mass of a large packet is 25x. 
Also, you are given that 
mass of a large packet = 1} x mass of medium packet 
so 24 ¢ = 14x mass of medium packet 
. 2x, 
mass of a medium packet = ar 3x, 
2 


Mass of 225 small packets =225x 

Mass of 210 medium packets = 210 x $x = 350x 

Mass of 250 large packets =250 x}x = 625x 
total mass = 1200x 


225 

Angle representing mass of small packets = p00 x 360° = 67.5° 
350 

Angle representing mass of medium packets = {200 x 360° = 105° 
625 


Angle representing mass of large packets = 7000 x 360° = 187.5° 


Let Fy denote total number of packets sold on Wednesday. 
Let F, denote total number of packets sold on Saturday. 


Then Fy = 250 + 210 + 225 = 685. 


Also reitw=2:1 
: Fy: Py=rgitw=4il 


oe Fy= 4 x 685 = 2740 
Number of small packets sold on Saturday = 2740 — (900 + 900) 
= 940 
940 small packets were sold on Saturday. 
Exercise lc Pie charts 


‘There are 34 pupils in Shumilla’s class. For these 
pupils she carried out the same kind of survey 
and drew a pie chart to show her results. 


1. There are 28 pupils in Peter’s class. He carried 
out a survey of how the pupils in his class 
travelled to school. His results are shown in the 
table below. (c) Calculate, giving your answer to three 

significant figures, the radius of a comparabl 


Method of travel Number of pupils pie chart which could be used to represent 
the results of Shumiilla’s survey. 

Bus 42. 

Car 2 2. The following data summarise the expenditure 

Bicycle 5 by a county council during a particular year. 

Walking 9 Senvice 


The data are to be illustrated by a pie chart. Education 
{a) Calculate, to the nearest degree, the sector : 2 

angles of the pie chart. Highways & Public Transport 
(b) Draw the pie chart using a circle of radius Police : 

5 cm, labelling each sector with the met! ‘od Social Services 

of travel it represents. Other 


These data are to be represented by a pie chart of 
radius 5 cm. Calculate, to the nearest degree, the 
angle corresponding to each of the five 
classifications. (Do not draw the pie chart.) 


The following year the county council spent 
£305.2 m. 


Find the radius of a comparable pie chart which 
could be used to represent this second set of 
data. (L) 


3. Five companies form a group. The sales of each 


company during the year ending S April, 1988, 
are shown in the table below. 


Company A B G D E 


Sales (in £1000s). 55. 130... 20.35... 60 


Draw a pie chart of radius 5 cm to illustrate 
this information. 


For the year ending 5 April, 1989, the total sales 
of the group increased by 20%, and this growth 
was maintained for the year ending S April, 
1990. 


If pie charts were drawn to compare the total 
sales for each of these years with the total sales 
for the year ending 5 April, 1988, what would be 
the radius of each of these pie charts? 


If the sales of company E for the year ending 
Sth April, 1990, were again £60 000, what 
would be the angle of the sector representing 
them? 


4. Acharity obtains its income from various 
sources. The table below shows these sources 
and the corresponding amounts of income for 
1993, 


Source Income (£) 
Advertising 30:000 
Donations x 
Fees 9.000 
Investments 3-000: 
Sponsorship: 10.000. 


A pie chart was drawn to illustrate the data. 
Given that the angle of the sector representing 
Donations was 204°, calculate 

(a) the total income for 1993, 

{b) the value of x, 


(c) the angle of each of the remaining sectors. 


x second pie chart was drawn to compare the 
Some ponding 1996 data with that of 1993. In 
ae the income from Sponsorship had increased 
a 28 800 and this was represented by a sector 
e angle 60° in the pie chart for 1996. Given that 
the radius of the 1996 pie chart was 9 cm. 
calculate the radius of the 1993 pie chart. (C} 


5. A golf club has four categories of membership: 


men, women, juniors and social members. The 

pie chart shown, which is not drawn to scale, 

illustrates the distribution of membership in 

1995. Given that there were 147 men and 35 

social members, calculate 

(a) the number of junior members, 

(b) the angle of the sector representing the 
social members, 

(c) the number of women. 


Women 


Juniors 


Social 


The corresponding pie chart for 2000 indicated 
that the number of men had increased by 49 
although the angle of the corresponding sector 
remained the same. Calculate the total number of 
members in 2000. 

Given that the radius of the 1995 pie chart was 


26 cm, calculate the radius of the 2000 pie chart. 
(C Additional) 


. During a particular fortnight a family spends 


£52.27 on meat, £23.10 on fruit and vegetables, 
£19.72 on drink, £12.41 on toiletries, £102.68 
on groceries and £9.82, on miscellaneous items. 


These data are to be represented by a pie chart of 

radius 5 cm. 

(a) Calculate, to the nearest degree, the angle 
corresponding to each of the above 
classifications. (Do not draw the pie chart.) 


The following fortnight the family spends 20% 

more in total. 

(b) Find the radius of a comparable pie chart to 
represent the data on this occasion. {L) 


. Pie charts A, B and C are drawn to compare, 


over a given period, the total value of the sales of 

certain items in each of three branches of a 

multiple store. The radii of the charts are 20 cm, 

30 cm and 40 cm, respectively. 

(a) Ifthe total sales value represented by chart 
B is £4500, calculate the total sales value 
represented by each of charts A and C. 

(b) The angle of the sector representing a 
particular item in chart A is 72°. Calculate 
the sales value of this item. 

{c) The sales value represented by a sector in 
chart C is £600. Calculate the angle of the 
sector. 

(d) One item occupies one quarter of chart A, 

and the sales value for this item is one half 

of that for the same item on chart B. 

Caiculate the angle of the sector for this 

item on chart B. (C Additional) 


8, Ona certain day, 125 people, each buying one 
newspaper, were asked which newspaper they 
had bought. The results of the survey are shown 
in the table below. 


Newspaper Number bought 
The Times 10 
The Telegraph 25 
The Express 40 
Some other paper 50 


Calculate the angles of the sectors of a pie chart 
of radius § cm which would illustrate these data. 


The following day a similar survey was carried 
out and the radius of the pie chart necessary to 
compare the new set of data with the previous 
set was 6 cm. Calculate the number of people in 
the second survey. 


THE MEAN 


(C Additional) 


9, A householder keeps an annual account of four 
items of expenditure. The figures for the year 
4991 are shown in the table below. 


Ttem Expenditure (£) 
Taxes %: 

Travel 1000 
Light/Heat y 
Telephone 300: 


A pie chart was drawn to illustrate these data. 
Given that the angles of the sectors representing 
Taxes and Travel were 124° and 80° 
respectively, calculate 

(a) the total expenditure for the year, 

{b) the value of x and of y, 

(c)_ the angle of each of the remaining sectors. 


In 1992, the total expenditure on the same items 
was £8000. Given that the radius of the pie chart 
for 1991 was 6 cm, calculate the radius of the pie 
chart for 1992 in order that the two sets of data 
may be compared. (C Additional) 


A typical or average value is useful when interpreting data. One such average is the mean. 


Consider the five numbers 


0.9, 14, 2.8, 3.4, 5.6. 


0.941.442.84+3.145.6 _ 13.8 


The mean is 


Example 1.15 


To obtain Grade A, Ben must achieve an average of at least 70 in five tests. If his average 
mark for the first four tests is 68, what is the lowest mark he can get in his fifth test and still 


obtain Grade A? 


Solution 1.15 


Ny tx +X3+X4 


68 
4 


For the first four tests, 


Ky +X +X34+x4=68 x4=272 


— = 2.76 
5 


For five tests, Ben wants his mean mark to be at least 70. 
Xp +X tx3tXytXs 


270 


272455. a9 


272 +x5 2350 
x52 350-272 
x52 78 


‘To obtain Grade A, Ben must get at least 78 marks in his fifth test. 


ele spent ont 


4 
A shorthand way of writing x, + x2 +3 +%4is Sx. 
i=l 


The symbol 3 (the Greek capital letter ‘sigma’) is used to denote ‘the sum of’. So for 
a 
xy tx, +X t+ 4%, you could write Dx 


x=) 
" 


Di 
: EE at hat ee 

The mean is often denoted by %, so X = —-—=-—_—* = iia a 
n n 


‘This is rather cumbersome, so usually the subscript 7 is omitted. 


Example 1.16 


The members of an orchestra were asked how many instruments each could play. Here are 
their results. 


2-5) 24-1 Ft 2023 
324.2 2 22 4°32 
1 2 Bt 24:2 2 A D2 
Find the mean number of instruments played. 


Solution 1.16 


n= 30, Dx=245424--4+142=63 
-_ =x 63 


x =—s 


300 2.1 


The mean number of instruments played is 2.1. 


ee reer 


in the above example, the data could have been arranged in a frequency distribution: 


Number of instruments, x 1 2 3 4 3 


Frequency, f it 10 5 3 1 


The total number of instruments played can be calculated in an organised way as follows: 


x fs fxx _ total number of instruments 
R= 

1 11 fies total number of people 
2 10 20 bf 
3 5 15 Sf 
4 42 63 
5 1 5 = 30 

Ef-30 Efe=63 =21 

tT tT ‘The mean number of instruments played is 2.1 


total number total number 
of people — of instruments 
played 


Note that Z/x is sometimes written Zxf and remember that x and f are multiplied. 


In general, for data in an ungrouped frequency distribution 


When the data have been grouped into intervals, the actual values of the readings are not 
known. You can only make an estimate of the mean. To do this, take the mid-interval value as 
representative of the interval. 


Remember that mid-interval value = } (lower class boundary + upper class boundary) 


Example 1.17 


The speeds, to the nearest mile per hour, of 120 vehicles passing a check point were recorded 
and are grouped in the table below. 


Speed (m.p.h.) 21225: 26-30 31-35 36-45 46-60 


Number of vehicles. 22 48 25 16 9 


Estimate the mean of this distribution. (C Additional) 


Solution 1.17 


Work out the mid-interval value for the first interval 21-25, using lower class boundary = 20.5; 
upper class boundary = 25.5. 


So mid-interval value = 5 (20.5 + 25.5) =23. 


You then assume that all the values in the interval 21-25 are in fact 23. 


Find the other mid-interval values and form a table: 


Mid-interval 
Speéd {m.p.h.) value, x f fe 
21-25 23 22 506 
26=30 28 48 1344 
31-35 33 25 825. 
36-45 40.5. 16 648 
46-60 53 9 477. 
Ef=120 ¥ fc = 3800 


‘The mean speed was 313 m.p.h. 


Using the calculator to find the mean 


You can use your calculator in ordinary computation mode to calculate the total and also do 
the division. It is more useful, however, to work in the statistical mode, known as SD or STAT 
mode. Your calculator may operate as in one of the examples below. If yours does not appear 


to follow one of the patterns, you will need to consult your calculator manual. 


Notice that once you put in the data you have access not only to the value for the mean, but 


also to 2 and Ix. 


Example 1.18 


Find the mean of the numbers 33, 28, 26, 35, 38. 


Solution 1.18 


| 


Casio 570W/85W/85 WA Sharp 
Set SD mode MODE] [MODE] [1] or [MODE MODE| (i 
Clear memories | [SHIFT] [Scl] [=] 2nd F} [CAI 
Input data 33] (DT 33] [DATA 
28] [DT 28) [DATA 


DT 


DI 


DT 


To obtain 
x= 32 


n=S 
x= 160 


SHIFT] 


RCL] j{C Red letters on third 


RCLI IB row of calculator 


To clear 
SD mode 


MOD. 


From the calculator, the mean is 32. 


RY OF DATA 31 


Example 1.19 


Find the mean number of children per family for the following frequency distribution. 


Number of children per family, x Bs 2 3 4 5 


Frequency, f. 3 4 8 2 S 


Solution 1.19 


Casio 570W/85 W/8S WA Sharp 
Set’ SD. mode MODE||MODE or [MODE] [2 MODE} {1] 
Clear memories SHIFT] [Scl] [=| 2nd F] [CA] 
Input data [1] [SHIET] [5] [3] (DT T] [x] [3] [DATA] 
oe gece 2] [SHIFT] G] 4] [DT 2] [x] [4] [DATA] 
3] [SHIFT] 5] [8] [DT (3] [x] [8] [DATA] 
[4] [SHUET) [3] [2] [DT] (4) x] 2] [DATA] 
[5] [SeuFT] [3] [3 [DT] [5] [x] [3] [DATA] 
To obtain 
R=29 SHIFT] [1] [=] [2nd F] [(] 
Xf =20 [RCL] [C] Red letters omthird 2nd F] [)] 
Lfea 58 [RCL] [B] |r 2nd FI [+] 
To clear [MODE] [7] MODE] [0] 
SD mode 
Froin the calculator, the mean is 2.9 children per family. 
Make sure that you input the data in the order x x f. Remember that x usually comes first in 


the frequency table. 


Mass in ke 


The diagram shows a histogram of the distribution of masses of 50 first-year University 
students. All the rectangles are there but the vertical axis has been torn off. 


(a) Compile a grouped frequency table for the distribution. 
(b) Use the values in your frequency table to find an approximate value for the mean mass of 
the students. 


Solution 1.20 


Let one small square be f on the vertical axis. 
Remember that in a histogram, the area of each rectangle is proportional to the frequency. 


The areas are 


Sh x 10, 10h x 10, 18h x 5, 22h x S, 10h x 15 
ie. 50h, 100h, 90h, 110h, 150b. 


So the total area = 500b. 
But total frequency = the number of students = 5 0 


500h = 50 
b=0.1 


‘This means that the frequencies are 5, 10, 9, 11, 15, giving a total of 50. 


(a) The frequency distribution is 


Mass (kg) Frequency, f 
40 <m< 50 5 
50<m< 60 10 
60<m<65 9 
65.5 <70 11 
70<m< 85S. 15 


(b) Take the mid-point of each interval to represent that interval. For example, the mid-point 
of the interval 60 << 65 is 3 (60 + 65) = 62.5. 


mid-point, x frequency, f; fx : 
45: 5 225 ie, X= pa 
55 10 550 =f 
62.5 9 562.5 _ 3242.5 
67.5 11 742.5 71550 
71.5 15 1162.5 = 64.85 kg 
EF=50 ¥ fe = 3242.5 


Using the calculator: : 4, The amounts spent by 120 motorists at a petrol {a) A student was asked to draw a histogram to 
S station were recorded, illustrate the data and produced the 
Casio 570W/85W/8SWA Sharp S following diagram. 
Z Amount spent, £x Number of motorists ¢ : ‘ 4 
Set SD. Mode MODE! [MODE ot [MODE] [2 MODE! [1 E A histogram to illustrate the heights of birch trees 
: KS: 12 
‘Clear memories SHIFT] [Sel] |= 2nd F} [CA S<x<10 38 » 204 
2 10<x<15 42 2 
Input Data: . = eat 
: ip 45] [SHIFT] [;} [|S] [DT 45| [x] [S} [DATA} 1s <x 220 20 Si | 
in the order § 45 
ee 55] [SHIFT] [5] [10] [DT 55] [x] [£0] [DATA] 20<x<40 8 E 
2 
62.5] [SHIFT] [;} (9) (DT 62.5] [x] {9} [DATA] (a) Draw a histogram to represent the data. 10 4 
{b) Estimate the mean amount spent. 
67.5] [SHIFT] [; DT 67.5] [x] [14] [DATA] 
5, The age distribution of the population of a 54 
77.5| [SHIFT] [5] [15] [DT 77.5\ (x] [15] [DATA] small village is recorded in the table below. 
‘ 0+ T T 
‘To obtain Age (years) Number of people i 5 ‘6 is os 36 30 
= 64.85 SHIFT] [7] [=] 2nd Fl [(] 0- 54 Height (mm) 
a Practise this 15- 78 Give two critical comments on this attempt 
Life ae RCL yourself, Make sure 30- 120 at a histogram. ji 
B fe = 3242.5 [RCL] [B] that you are familiar im s Ne eee che able aa i 
: ss with the method on a {c) Calculate an estimate of the mean height of | 
To clear MODE! MODE [0] your calculator 100= 0 the birch trees, giving your answer correct to hee 
SD Mode three significant figures. (C) i 
Draw, on graph paper, a histogram to represent | 
these data. 8. Telephone calls arriving at a switchboard are | 
nhtiniernecatinicaetana a ; i Bis answered by the tclephonist. The following table od 
Estimate the mean of this SR ae ah shows the time, to the nearest second, recorded it a 
( itional) as being taken by the telephonist to answer the | 
Exercise 1d Tt r 6. Find the mean length for the data represented by calls received during ene day. } 
¢ je mean the stem and leaf diagram. Time to answer : 4 
1. Find the mean of each of the following sets of 2. Asample of 100 boxes of matches was taken and Key 15| 4 means 16 cm (co bearest second Number of cals 
numbers, — a record made of the number of matches per : 10-19. 20. 
{i) not using SD mode, box. The results were as follows: : Stem | Leaf ae 
(ii) using SD mode. 12 coo 20-24 : 20 
Compare your answers. Number of = 1S o41 25-29 3 AS 
(a) 5,6, 6,8, 8,9, 11, 13, 14, 17 natches'per box”. 47,48 49-50. SF ah ec 4 ae 30 i 
31-34 16 
(b) 94851537156; 1575160 Frequency 4209 352417 24) 0012 ee _ 
(c) 444, 474, 484, 514, 524, 543, 553, 56} 27) 11 : 
(d) 1769, 1771, 1772, 1775, 1778, 1781, 1784 Calculate the mean number of matches per box. 30 Log 40°52 10 
(ec) 0.85, 0.88, 0.89, 0.93, 0.94, 0.96 fn ?. The height, correct to the nearest metre, was (a) Represent these data by a histogram. 
" 3. ee - the er of bran on oe recorded for each of the 59 birch trees in an area Give a reason to justify the use of a 
OD eg 5 65527. show 7 Fi Hi Aiken ree abs of hea oi 28 of woodland. The heights are summarised in the histogram to represent these data. 
: h if ind the mean number o! s ona following table. {b) Calculate an estimate of the mean time 
fo 4S B10 17 5 1 ne taken to answer the calls. ({L) 
() [E97 oe 09 30 a Number of books Number of shelves 5-9 10-12 13-15 16-18 19-28 
31-35 4 
fo 30 43 si 49. 4a 35 3646 é i448 1s a 8 
th) [a aa a. 12 aa as ee 10 
46-50 13 
f 14 25. 32, 23 6 S1i-SS5 3: 
56-60 2 


A-LEVEL 


Weighted means 


In some situations it may not be suitable to calculate an ordinary mean. There may be times 
when you wish to place greater emphasis on some of the values, as illustrated in the following 
example. 


Example 1.21 
A candidate obtained the following results in her GCSE mathematics examination: 
Paper 1: 72%, Paper 2: 64%, Coursework: 73% 


The regulations state that the two written papers have equal weighting and count for 80% of 
the final result, whereas the coursework counts for 20%. What was the candidate’s final 
mark? 


Solution 1.21 
The results are in the following ratio: 
40% :40% : 20% =4:4:2=2:2:1. 
For the final result, you have to take this weighting into account: 
2(72) + 2(64) + 1(73) 345 
24241 rae 
Therefore the final mark is 69%. 


69 


weighted mean = 


In general, if ay, 2%) 06) 


2 


weighted mean = — 


“xercise le Weighted means 


1. Find the weighted mean of the numbers 8 and 3. The prices of articles A, B and C are £30, £42 
12, if they are given the weights 2 and 3 and £65. Find the mean price, if the three articles 
respectively. are given weights of 5, 3 and 2 respectively. 

2. The final mark allocated to a student is 4, The weighted mean of the two numbers 30 and 
calculated from her mark in each subject. 15 is 20. If the weightings are 2 and x 
(a) The class teacher worked out an ordinary respectively, find x. 

mean. 

(b) The headteacher decided to weight the 5. Two students, Jack and Jill, take an examination 
subjects in proportion to the number of in French, German and English. The table below. 
lessons per week, as shown in the table. shows the marks for each student and the weight 

: : to be applied to each subject. 
Number of lessons 

Subject Mark per week Subject French German Engi 
Mathematics 64% 5 Marks for Ja ( 
English 52% 4 Marks for Ji 
Science. 71% 6 Weight 
French 75% 3 aj 
‘Hist 82%, 4 Calculate the value of x for which Jack an 

pene : have the same weighted mean mark and fist 

value of this mean. {(C Addi 


Which method gave the higher mark and by how 
much? 


VARIABILITY OF DATA 


Each of these sets of numbers has a mean of 7 but the spread of each is set is different: 


(a) 7,7, 7, 7,7 
(b) 4, 6, 6.5, 7-2, 11.3 
(c) -193, -46, 28, 69, 177 


There is no variability in set (a), but the numbers in set (c} are obviously much more spread 
out than those in set (b). 


There are various ways of measuring the variability or spread of a distribution, two of which 
are described here. 


The range 


The range is based entirely on the extreme values of the distribution. 


highest value — lowest value 


In (a) the range = 7-7=0 

In (b) the range = 11.3-4= 73 

In (c) the range = 177 - (-193) = 370 

Note that there are also ranges based on particular observations within the data and these 
percentile and quartile ranges are considered on page 68. 


THE STANDARD DEVIATION, s, AND THE VARIANCE, $? 


The standard deviation, s, is a very important and useful measure of spread. It gives a measure 
of the deviations of the readings from the mean, &. It is calculated using all the values in the 
distribution. To calculate s: 


e for each reading x, calculate x —X, its deviation from the mean, 
e square this deviation to give (x —x)? and note that, irrespective of whether the deviation 
was positive or negative, this is now positive, 
e find X(x — x)’, the sum of all these values, 
e find the average by dividing the sum by #, the number of readings; 
z)2 
this gives aa and is known as the variance, 


© finally take the positive square root of the variance to obtain the standard deviation, s. 


th mean 3 given by 


Each of the three sets of numbers on the previous page has mean 7, i.e. X = @ 
(a) For the set 7, tyls ly 7 


zis x~ X= 7-7 =0 for every reading, s = 0, indicating that there is no deviation from 
the mean, 


Solution 1.22 


(b) For the set 4, 6, 6.5, 7.2, 11.3 
Sa)? = (4-7) + (6—7)2 + (6.5 —7)2 + (7-2-7)? + (11.3 — 7)? = 28.78 


(6-7) 
a) 
ex E@ =~)" OBIE aay ets 
n 5 


(c) For the set -193, -46, 28, 69, 177 
L(x — ¥)? = (-193 — 7)? + (-46 - 7)? + (28 - 7)? + (69 ~ 7)? + (177 - 7)? =75 994 


( 
2 
Pees JOO aides 
n 5 


Notice that set (c) has a much higher standard deviation than set (b), confirming that it is 
much more spread about the mean. 


Remember that 
Standard deviation = Vvariance 
Variance = (standard deviation)* 


NOTE: 


@ The standard deviation gives an indication of the lowest and highest values of the data as 


follows. In most distributions, the bulk of the distribution lies within two standard 


deviations of the mean, i.e. within the interval % + 2s or (% — 2s, ¥ + 2s). This helps to give 


an idea of the spread of the data. 

e The units of standard deviation are the same as the units of the data. 

e Standard deviations are useful when comparing sets of data; the higher the standard 
deviation, the greater the variability in the data. 


Example 1.22 


Two machines, A and B, are used to pack biscuits. A random sample of ten packets was taken 
from each machine and the mass of each packet was measured to the nearest gram and noted. 


Find the standard deviation of the masses of the packets taken in the sample from each 
machine. Comment on your answer. 


Machine A 
(mass in g) 196, 198, 198, 199, 200, 200, 201, 201, 202, 205 


Machine B 
(mass ing) 192, 194, 195, 198, 200, 201, 203, 204, 206, 207 


Ex 2000 z 
Machine A. ¥=—~ = —— = 200 Machines: Gee 2? 2300 
n 10 n 10 


Since the mean mass for each machine is 200, x —-% = x — 200 


To calculate s, put the data into a table: 


Machine A Machine B 
x x - 200 (x + 200)? x x= 200 (x. 200)? 
196 4 16 192 8 64 
198 —2 4 194 =6 36. 
198 -2 4 195: +§ 25. 
199 =1 1 198 =2; 4 
200. 0 0 200 0 0 
200 0 0 201 1 1 
201 1 ft 203 3 9 
201 1 1 204 4 16 
202 2 4 206. 6 36 
205 5 25 207 7 49 
56 240 
» Xe~ 200)? 2 He - 200 
a 10 
= 5.6 =24 
s=V5.6 s=V¥24 
= 2.37 (2 dp.) = 4,90 (2 dp.) 
Machine A: s.d. = 2.37 g (2 d.p.) Machine B: s.d. = 4.90 g (2 d.p.) 


Machine A has less variation, indicating that it is more reliable than machine B. 


Alternative form of the formula for standard deviation 


The formula given above is sometimes difficult to use, especially when % is not an integer, so 
an alternative form is often used. This is derived as follows: 


1 
st? =— L(x —X)* 

n 

1 2 mye 4 2 
= — L(x? - 2Xx + X*) 
n 

1 
=> (Ex? Ie Ex + ER?) 


Lx? one, 
Flee 2%(%)+%° since 


xy 2 
NOTE: It is useful to remember that 2X x? can be thought of as 
n 


‘the mean of the squares minus the square of the mean’. 


Example 1.23 


The mean of the five numbers 2, 3, 5, 6, 8 is 4.8. Calculate the standard deviation. 


Solution 1.23 


L(x — X)? Dx? 
Method 1 using s = Bex)" px” 
n n 
x HOR =x)? 
2 =2.8 7,84 
3 1.8 3:24 
5 0.2 0.04 
6. 2 1.44 
8 3.2, 10.24 
22.80 
22.80 138 
2 = 2 =~ (4.8) 
oS sta - (4.8) 
= 4.56 =4.56 
= V4.56 s=V4.56 
= 2.14 (2 d.p.) = 2.14 (2 dip.) 


The working for method 2 is less involved. 


Using the calculator to find the standard deviation 


The standard deviation can be found directly using the calculator in SD mode. The numbers 


are entered in the same way as when you are finding the mean. 


To find the standard deviation of the five numbers 2, 3, 5, 6, 8 used in Example 1.23: 


Casio 570W/85W/8SWA Sharp 
Set'SD mode MODE] [MODE] [1 }or [MODE] [2 MODE} [1] 
Clear memories SHIFT] [Sel] [= 2nd F] [CA] 
Input data 2|\DT 2] [DATA] 
3] [DT 3] [DATA 
3] [DT 35} [DATA] 
6| {DT 6] [DATA] 
8] [DT 8] [DATA 
To obtain 
s=2.135... SHIFT] |2} |= 2nd F] [+] 
You can check 
Rz48 SHIFT] [1] [= 2nd F] (( 
Vee d4 RCL] [B aoe 2nd F] [+] 
¥x2 = 138 RCLIIA pate 2nd F S| 
hts RCL [Cc] a Ind F] 1) 
a ine MODE} /1 MODE] [0] 


on, the formula for s is 


Consider again the data given in Example 1.19, on page 32, which shows the number of 
children in 20 families. The mean is 2.9. 


Number of children per family, x 1 2 3 4 5 


4 8 2 3 


Frequency, f 3 


You could use one of these three methods for finding the standard deviation. Method 2 is 


more popular than Method 1. 


; Lf(x -%)* 
Method 1 -using s= xf 


x x= 2.9 (x = 2.9)? f f(x = 2.9)? 
1 =1.9 3.61 3 10.83 
2 =0.9 0.81 4 3.24 
3 0.1 0.01 8 0.08 
4 eat 1.21 2 2.42 
5 24 4.41 3 13.23 
Ef=20 Eflx —x)? = 29.80 
is Xf(x- 2.9)? 


‘The standard deviation of the number of children per family is 1.22 (2 d.p.}. 


She 
Method 2—using s= a — x 


xf 


¥ fe? = 198 


x fx? % 
Sf - (2.9) 
YB - (2.9) 
1.49 


s=V1.49 
=1.22 (2 dp.) 


The standard deviation is 1.22 (2 d.p.), as before. 
Method 3 — using the calculator in SD mode. 


This time you need to take account of the 


frequencies, and this is done in exactly the same 


way as when finding the mean: 
Casio. 570W/85S W/8SWA Sharp 
Set SD mode MODE] [MODE] [1] or [MODE] [2] MODE] [ 
Clear memories SHIFT | [Scl] f= 2nd F| |CA 
Input data (1) [SHIFT] [5] [3] [BT T] [x] [3] [DATA] 
ene ee [2] (SHIFT) [5] 4] (BT (21 {x] [4] [DATA 
(3) (SET) (3) [8] DT [3] x] [8] DATA] 
j [4] [SHIFT] [3] [2] [DT [4] i] (2] [DATA 
5 | [SHIFT] [5 | [3 | [DT 5] [x] [3] [DATA 
To. obtain 
R=2.9 SHIFT] [4] [=] 2nd F] [{] 
s= 1.220... SHIFT] [2] [=] 2nd F] [=] 
Uf= 20 [RCL] {c] Red letters on third (nd F] Ud 
Lf = 58 [RCL] [B] row of calculator 2nd F] |+] 
Life? = 198 [RCL] [Al 2nd F] [=] 
To clear MODE] [1] MODE] [0] 
SD: mode 


Therefore the standard deviation is 1.22 (2 d.p.), as before. 


In a grouped frequency distribution, the mid-interval value is taken as representative of the 


interval, as in the fo 


Example 1.24 


lowing example. 


Frequency density (candidates per minute) 


10 11 


Time (minutes) 


An intelligence test was taken by 115 candidates. For each candidate the time taken to 
complete the test was recorded, and the times were summarised in a histogram (see diagram). 
Write down the frequency for each of the class intervals 0-1, 1-2, 2~3, 3-5 and 5-10 
minutes. 


Calculate estimates of the mean and standard deviation of the times taken to complete the 
test. (C) 


Solution 1.24 


Frequency = frequency density x interval width. Note that the interval 2-3, for example, 
represents 2 < time < 3. 


Time (min) 0-1 1-2 2=3 3-5 5-10 


Frequency 10 15 25 40 25. 


To calculate estimates for the mean and standard deviation, use mid-interval values, x. 


Time (min) x f fx fx 
0-1 0.5 10. 5 2.5 
1-2 1S. 15 22.5 33.75 
2-3 2.5 25 62.5 «156.25 
325 4 40 160. 640 
5-10 735 25 187.5 1406.25 
Ef=115 Efe = 437.5 E fx? = 2238.75 
Lfx 437.5 
pote 2878 5 a oad) 
zf 115 
ae | 2238.75 
= axte | 3.80.22 2.2 (2 sf 
i - “fl 115 ae 


The mean time is 3.8 minutes and the standard deviation is 2.2 minutes. 


[You could have calculated these directly using the calculator in SD mode. Check them 
yourself.] 


sernmnecrommenanacerernte 


If you are given summary information, rather than the raw data or frequency distribution, you 
cannot use the calculator in SD mode. You will have to use the formulae to calculate the mean 
and standard deviation, as in the following example. 


Example 1.25 


(a) Cartons of orange juice are advertised as containing 1 litre. A random sample of 
100 cartons gave the following results for the volume, x. 


Ex= 101.4, ~Ex*= 102.83 


Calculate the mean and the standard deviation of the volume of orange juice in these 
100 cartons. 


(b) A machine is supposed to cut lengths of rod 50 cm long. 


A sample of 20 rods gave the following results for the length, x. 


(i) Calculate, the mean length of the 20 rods. 


Efe = 997, Lfx?=49 711 


(ii) Calculate the variance of the lengths of the 20 rods. 


State the units of the variance in your answer. 


Solution 1.25 
(a) Ix = 101.4, Ex? = 102.83, 2 = 100 


xx 101.4 


n 100 


X= 


The mean volume is 1.014 litres. 


R= 


Deeg pee 
n 


— 1.0147 = 0.0101 ... 


The standard deviation of the volume is 0.010 litres (2 s.f.) 
(b) Efe = 997, E fx? = 49 711, U f= 20 


4 ufx 997 
go 772 49,.8 
(i) B= SF = 3p 49.85 
The mean length of the rods is 49.85 cm. 
Lf? 4971 
(ii) Variance = is X= alate 


Lh 


The variance is 0.5275 cm’. 


if Mean and standard devi 


1. Do not use the statistical program on your 
calculator for this question. 


(i) 


(ii) 


For each of the following sets of numbers, 
calculate the mean and the standard 
deviation. Try using both forms of the 
formula for the standard deviation in parts 
(a) to (c). In parts (d) to (f} choose one of 
the methods. 


(a) 2,4,5,6,8 

(b) 6, 8,9, 14 

(c) 11, 14, 17, 23, 29 

(d) 5, 13,7, 9, 16, 15 

(ec) 4.6, 2.7, 3.1, 0.5, 6.2 

(f) 200, 203, 206, 207, 209 


Now check your answers using your 
calculator in SD (STAT) mode. 


- 49,857 = 0.5275 


100 factory workers. 


(a) Draw a histogram to illustrate this 


. The table shows the weekly wages in £ of each of. 


information. 
(b) Calculate the mean wage and the standard 
deviation. 
Number of 
Wage & workers 
200 <x <250 10 
250 <x < 300 16 
300<x%<375 40 
375 <x <400 26 
400 <x'< 500 8 


3. Do this question 
(a) without using SD mode, 
{b) using SD mode on your calculator. 


The score for a round of golf for each of 50 club 
members was noted. Find the mean score for a 


round and the standard deviation. 


Score, x. Frequency, f 
66 2 
67 5 
68 10: 
69 12 
70. 9 
7A 6 
72 4 
73 2 


4. The scores in an IQ test for 60 candidates are 
shown in the table. Find the mean score and the 


standard deviation. 


Score Frequency 


Se REASsSR LEAS ares 


100-106 8 
107-1143 13 
114-120 24 
121-127 tt 


428-134 4 


5. The stemplot shows the times, recor ied to the 


nearest second, of 12 people in a race. 


Calculate the mean time and the standard 
deviation. 


Ts 


10. 


11. 


The following table shows the duration of 

40 telephone calls from an office via the 

switchboard. 

(a) Obtain an estimate of the mean Jength of a 
telephone call and the standard deviation. 

{b) Illustrate the data graphically. 


Duration in minutes Number of calls 
<1 6 
1-2 10: 
2-3 15 
3-5 5 
5-10 4 
>10 0 


(O@C) 


For a set of ten numbers Lx = 2.90 and 
Sx? = 8469. Find the mean and the variance. 


For a set of nine numbers X(x — x)? = 234. Find 
the standard deviation of the numbers. 


For a set of nine numbers L(x - x)? = 60 and 
Ex? = 285, Find the mean of the numbers. 


A group of 20 people played a game. The table 
below shows the frequency distribution of their 
scores. 


Score 1 2 4 % 


Number of people 2 5 7 6 


Stem Leaf 


Key.1|5 means 15 seconds 


1 2°3 

1 55666 
1 io. 
2 O14 


6. A vertical line graph for a set of data is shown 


below. Calculate the mean and standard 
deviation of the data. 


a 


Frequency 


12. 


Given that the mean score is 5, find 

(a) the value of x, 

(b) ‘the variance of the distribution. 

(C Additional) 


From the information given about each of the 
following sets of data, work out the missing 
values in the table: 


n ix ix? x s 
(a) | 63 7623. 924 800 
(b) 152.6 10.9 1.7 
(c) | 52 57300. 33 
(d) | 18 ST d 


_ Ata bird observatory, migrating willow warblers 


are caught, measured and ringed before being 
released. The histogram below illustrates the 
lengths, in millimetres, of the willow warblers 
caught during one migration season. 


(b) State briefly how it may be deduced from 
the histogram (without any calculation) that 
an estimate of the mean length is 111 mm. 
Explain briefly why this value may not be 
the true mean length of the willow warblers 
caught. 

{c) Given that the lengths, x mm, of the willow 
warblers caught during this migration 

Lo season were such that Ex = 13 099 and 

: ; ; 4 Ex? = 1455 506, calculate the standard 
0100 105 «4110 118 120 128 deviation of the lengths. (C) 
Length (mm) 


Frequency density (number 
of birds per mm of length) 


14. For a particular set of observations Zf= 20, 
fx? = 16 143, Bfx = 563. Find the values of the 
mean and the standard deviation. 


(a) Explain how the histogram shows that the 
total number of willow warblers caught at 
the observatory during the migration season 
is 118. 


15. Fora given frequency distribution 
Eflx x)? = 182.3, Efe? = 1025, Bf= 30. 
Find the mean of the distribution. 


16. The speeds of cars passing a speed camera are shown in the histogram. 
Calculate estimates of the mean speed and the standard deviation. 


o 


Frequency density 
oo 


Speed (m.p.h.) 


Calculations involving the mean and standard deviation 


Example 1.26 


(a) Calculate the mean and the standard deviation of the four numbers 2, 3, 6, 9. 


(b) Two numbers, a and 6, are to be added to this set of four numbers, such that the mean is 
increased by 1 and the variance is increased by 2.5. Find a and 6. iL Additional 


Solution 1.26 


Ex? 449436481 
og " t8h go as, saV75 =2.7 (2 sf) 
n 


(b) New mean =5+1=6 
2434+64+9+a+b ; 
6 ie: 
20+a+b im 
6= é | 
20+a+6b=36 | 
at+b=16.....0 li 


Variance of original set = s? = 7.5. So new variance = 7.5 + 2.5=10 


449436+814+a°+b? 


6 
pa tete oh 96 


130 +a? +b? 


=46 


6 
130 +a? +b? =276 
a+b? =146 .....® 
From @ b = 16 — a. Substituting in @ 


a? + (16 — a)? = 146 
a? +256 -32a+a?= 146 
2a? -32a+110=0 
a? -164a+55=0 
(a-11)(a- 5) =0 a 
a=11a=5 fe: 
Ifg=11,b=16-11=5 i 
Ifa=5,b=16-S5=11 i 


So the two numbers are 5 and 11. 


COMBINING SETS OF DATA 
Example 1.27 


The number of errors, x, on each of 200 pages of typescript was monitored. The results when 
summarised showed that 


Yx=920 Lx? = 5032 


{a) Calculate the mean and the standard deviation of the number of errors per page. A further 
50 pages were monitored and it was found that the mean was 4.4 errors and the standard 
deviation was 2.2 errors. 


(b) Find the mean and the standard deviation of the number of errors pet page for the 250 
Pages. (L) 


Solution 1.27 
_ =x 920 
(a) ¥=—- = = 46 
n 200 
Bt es ans 
n 20 
s=V4=2 


The mean is 4.6 errors per page and the standard deviation is 2 errors. 


(b) For the errors, y, on the further 50 pages 


Mean = 4.4 
zy 
44=— 
50 


Ly = 50 x 4.4 = 220 
The standard deviation = 2.2 
Ly? 
0 2.22= 07 4.4? 
Ly? = $0(2.2? + 4.47) = 1210 
For the combined set of 250 pages: 
Total number of errors = Ix + Ly = 920 + 220 = 1140 


Ey? 
(Standard deviation)? = = = Y 4.562 
5032 +1210 
Pech eccaiaay 
250 


= 4.1744 
Standard deviation = V4.1744 = 2.04 (3 s.f.} 


In general, for a combined set of numbers 
é Lx 
and variance = 


- {mean}* 


w 


b Hh 


Hy by 


Remember that standard deviation = Vvariance 


Example 1.28 


Three statistics students, Ali, Les and Sam, spent the day fishing. They caught three different 
types of fish and recorded the type and mass (correct to the nearest 0.01 kg) of each fish 
caught. At 4 p.m., they summarised the results as follows. 


Number of fish by. type. All fish caught 
Perch ‘Tench Roach Mean mass (kg) Standard deviation (kg) 
Ali 2 3 7 1.07 
Les 6 2 8 0.76. 
Sam 1 0 1 1.00 


(a) State how it may be deduced from the data that the mass of each fish caught by Sam was 
1.00 kg. 

(b) The winner was the person who had caught the greatest total mass of fish by 4 p.m. 
Determine who was the winner, showing your working. 

(c} Before leaving the waterside, Sam catches one more fish and weighs it. He then announces 
that, if this extra fish is included with the other two fish he caught, the standard deviation 
is 1.00 kg. Find the mass of this extra fish. (C) 


Solution 1.28 


(a) If the standard deviation is 0, there is no deviation from the mean. All the readings must 
be exactly the same as the mean. 


Since the mean is 1.00 kg, both fish must have weighed 1.00 kg. 


(b) Number of fish Mean Total:mass 
Ali 12 1.07: kg 12x 1.07 = 12.84 kg 
Les 16 0.76 kg 16 x 0.76'=.12.16 ke 
Sam 2 1.00 kg 2x 1.00 = 2.00 kg 


‘The winner was Ali. 
(c) Sam: let mass of extra fish be x, so masses of his three fish are 1, 1, x. 


2+x 


53 
[V+1?+x" 
s= —_——- x 
3 
2 2 
1.00 = j2+x _ 2+x 
3 3 


2+x? [2+x\ 
1= 2 -| 3 ‘| (squaring both sides) 
1a2t™ 444x427 
3 9 


9 =3(2 +x?) — (44+ 4x +x?) (multiplying by 9) 
9=643x7-4-4x— x? 


0=2x? 4-7 
44 V16-4 C7) 
a 
44 V72 
eer 
x=3.121... (ignoring negative value for x) 


Mass of Sam’s extra fish is 3.12 kg (2 d.p.) 
CO ce 


4. The mean of ten numbers is 8. If an eleventh 
number is now included in the results, the mean 
becomes 9, What is the value of the eleventh 
number? 


2. The mean of four numbers is 5, and the mean of 
three different numbers is 12. What is the mean 
of the seven numbers together? 


3. The mean of n numbers is 5. If the number 13 is 
now included with the numbers, the new mean 
is 6. Find the value of 7. 


4, The mean of the numbers 3, 6, 7, a, 14, is 8. 
Find the standard deviation of the set of 
numbers, 


5, The numbers a, b, 8, 5, 7 have mean 6 and 
variance 2. Find the values of a and b, if a> b. 


6. Fora set of 20 numbers 2x = 300 and 
Zax? = 5500. For a second set of 30 numbers 
Ex = 480 and Lx? = 9600. Find the mean and the 
standard deviation of the combined set of 
50 numbers. 


7. If the mean of the following frequency 
distribution is 3.66, find the value of a. 


x 1 2 3 4 5 6 
f 3 9 a i 8 7 


8. A bag contained five balls cach bearing one of 
the numbers 1, 2, 3, 4, 5. A ball was drawn from 
the bag, its number noted, and then replaced. 
This was done 50 times in all and the table 
below shows the resulting frequency distribution. 


Number. 1 2. 3 4 5 


Frequency: x i1 y 8 9 


If the mean is 2.7, determine the values 
of x and y. 


9, Parplan Opinion Polls Ltd conducted a 
nationwide survey into the attitudes of teenage 
girls. One of the questions asked was “What is 
the ideal age for a girl to have her first baby?” In 
reply, the sample of 165 girls from the Northern 
zone gave a mean of 23.4 years and a standard 
deviation of 1.6 years. Subsequently, the overall 
sample of 384 girls (Northern plus Southern 
zones) gave a mean of 24.8 years and a standard 
deviation of 2.2 years. 


Assuming that no girl was consulted twice, 
calculate the mean and standard deviation for 
the 219 girls from the Southern zone. (AEB} 


10. 


14. 


2 le Mean and standard deviation 


The manager of a car showroom monitored the 
numbers of cars sold during two successive 
five-day periods. During the first five days the 
numbers of cars sold per day had mean 1.8 and 
variance 0.56. During the next five days the 
numbers of cars sold per day had mean 2.8 and 
variance 1.76. Find the mean and variance of the 
numbers of cars sold per day during the full ten 
days. (NEAB) 


Prior to the start of delicate wage negotiations in 
a large company, the unions and the 
management take independent samples of the 
work force and ask them at what percentage 
level they believe a settlement should be made. 
The results are as follows: 


Sample Size. Mean 


Standard 
deviation 


‘management’ 350 12.4% 2.1% 
‘union’ 237 10.7% 1.8% 


12. 


Assuming that no individual was consulted by 
both sides, calculate the mean and standard 
deviation for these $87 workers. (AEB) 


In a germination experiment, 200 rows of seeds, 
with ten seeds per row, were incubated. The 
frequency distribution of the number of seeds 
which germinated per row is shown below. 


Number of seeds germinated Frequency 


4 
10 
16 
28 
34 
44 


ae 
SOON A UAW NE OS 


(a) Calculate the mean and the standard 
deviation of the number of seeds 
germinating per row. 


For another 50 rows an analysis shows that the 

mean is 4.4 seeds and the standard deviation Is 

2.2 seeds. 

(b) Determine the mean and, to two decimal 
places, the standard deviation for the 
250 rows. 


13. The figures in the table below are the ages, to the 
nearest year, of a random sample of 30 people Number of Mean cost SD: 
negotiating a mortgage with a bank. holidays () (&) 
Shop R 32 190.35. 10:4 
Shop 'S. 24: 202.25: 15.5 


15. Three random samples of 50, 30 and 20 bags 
respectively are taken from the production line of 
12, kg bags’ of cat litter. The contents of each 
bag are then weighed. A summary of the results 
is shown in the table. 


Copy and complete the following stem and leaf 
diagram. Use the diagram to identify two 
features of the shape of the distribution. 


25 

30 : i Mean wt. S.D: 
35 Sample Size (kg) (ke) 
Find the mean age of the 30 people. Given that 1 : 50. 18 0:5. 
18 of them are men and that the mean age of the 30 12.1 0.9 
men is 37.72, find the mean age of the 12 - i 7 
women. (MEI) 3 20 11.7 tA 


Find, in kilograms to two decimal places, the 
mean weight per bag and the standard deviation 
for the 100 bags. (L) 


14. A travel agency has two shops, R and 5. The 
number of holidays purchased in a particular 
week and the mean and standard deviation of the 
costs of these holidays at each shop are shown in 
the following table. 16. The average height of 20 boys is 160 cm, with a 
Calculate the mean, and, to the nearest penny, standard deviation of 4 cm. The average height 
the standard deviation of the costs of all the of 30 girls is 155 cm, with a standard deviation 
56 holidays purchased. of 3.5 cm. Find the standard deviation of the 

whole group of 50 children. 


SCALING SETS OF DATA 


Example 1.29 


Sweets are packed into bags with a nominal mass of 75 g. Ten bags are picked at random 
from the production line and weighed. Their masses, in grams, are . 


76, 74.2, 75.1, 73.75 72s 74.35 75-4, 74, 73-1, 72.8 

{a) Use your calculator to find the mean mass and the standard deviation. 

It was later discovered that the scales were reading 3.2 g below the correct weight. 

(b) What was the correct mean mass of the ten bags and the correct standard deviation? 


(c} Compare your answers to (a) and (b) and comment. 


Solution 1.29 


(a) According to the scales with measurements being given in grams 


X= 74.06, s=1.166 ... =1.17 (2 dp.) 


(b) The correct readings are: S You can see from the diagram that the new set of data is much more spread out. 
79.2, 77, 78.3, 76.95 75.25 77-55 78:65 77-25 76.3, 76 7 
%= 77.26, s= 1.166 ...= 1.17 (2 dp.) = me mean 
Original data 
(c) Notice that 77.26 — 74.06 = 3.2 i.e. correct mean ~ original mean = 3.2 
So correct mean = original mean + 3.2: correct s.d, = original s.d. 


8 10 12 14 16 18 
If cach reading is increased by 3.2, then the mean is increased by 3.2. The standard 


deviation, however, remains unaltered. x x x x 
New data 


New mean 


Showing the two sets of readings on a graph helps to show that although the mean increased, In general, if cz 
the spread of the data about the mean remained the same. 


is multiplied by a constant k 


@ the sncan is multiplied by k, 


e the standard deviation is multiplied by | & | where | 


‘is the positive value of k. 


Original mean 
ity hen = ki 
Original data . then) 7 
xx xX xXx xx x 5, =k] sy 
For example, if y = -}x, then y=-}% and s,= ts, 
72 2B a 16 76 7 78 79 ' 
Combining these two results, 
x x xX x xx x xX x im 
New data if where a and 6 are constants 
New mean then 
and s,=ja 
in general, if each number is increased by a constant ¢ Example 1.30 
. Joe’s mean mark for the physics tests for the term was 72. His teacher decided to scale all the 


marks according to the formula y = 2x — 6, where y is the new mark and x the original mark. 


Find Joe’s new mean mark. 


Solution 1.30 
Now consider what happens when each number in a set of readings is multiplied by a 


constant. y=2x-6 
y= 1-6 
For the four numbers 2, 3.5,5,6 %=4.125, s,= VSIS 25 =2x72-6 
= 84 


Multiplying each number by 3 to obtain y, where y = 3x 
gives the numbers 6, 10.5, 15, 18. 


Joe’s new mean mark is 84. 


For these, = 12.375, sy=4.546 ... 
Example 1.31 
Now 12.375 + 4.125 = 3, so y= 3X 
and 4.546 «+ 1.515 ++ =3, sosy=3s, The standard deviation of three numbers a, b, c is 3.2. 
e as rv standard deviation of the three numbers 3a, 38, 3c. 
te) State pane deviation of the three numbers 4 +2, 6 +2, +2. 

andard deviation of the three numbers 2a + 5, 2b +5, 2¢+5. {C) 


Solution 1.31 


(a) If y = 3x, then Sy= 38> 3x3.2=9.6 
(b) Iy=ax+2, then sy=s,=3.2 
(c) y= 2x +S, then s,=2s,=2%3.2= 64 


——— ae 
ee 


Comparing data by scaling 


s in two papers, you 


data, for example examination mark 
the two standard 


Tf you wish to compare two sets of 
that the two means are the same and 


can scale one of the sets of data so 
deviations are the same. 


Example 1.32 


For students on an Electronics course the assessment consists of two components: a written 


examination paper and a project. The marks for the examination paper are distributed oa a 
mean of 62 and a standard deviation of 16. Those for the project have a mean of 37 anda 
standard deviation of 6. Anna, a student on the course, scored 80 marks on the examination 


paper and 46 marks for her project. 


ks into a standardised score, such that, for each 


h of Anna’s mar 
Se eee dard deviation for all students on the course are 50 and 20, 


component, the mean and stan 
respectively. 


3 AB 
(b) Hence compar ent components (NE ) 


e Anna’s relative performance in the two assessm 


Solution 1.32 


(a) Standardised values: 9 = 50, s,= 20 
Examination X= 62,5,= 16 
Let ysaxtb 
then y=ax+b 


Now Sy = AS 
20-416 
a=125 


50=62x1.25+5 


Substituting in © 
b=-27.5 


Project %=37,s,= 
Let y=ext+d 


50=37x3h4+d 


Substituting in @ 
d=-734 


The transformation for the project is y = 33x ~ 733. 


When x = 46, y = 34 x 46 - 733 = 80 


Anna’s standardised mark for the project is 80. 


(b) Relatively, Anna performed better on the project than in the examination. 


Exercise 1h Scaling sets of data 


1. (a) Find the mean and the standard deviation of 
the set of numbers 4, 6, 9, 3, 5, 6, 9. 
(b) Deduce the mean and the standard deviation 
of the set of numbers 514, 516, $19, 513, 
515, 516, 519, 
(c) Deduce the mean and the standard deviation 
of the set of numbers 52, 78, 117, 39, 65, 


78,117. 6. 


2. A set of numbers has a mean of 22 and a 
standard deviation of 6. If 3 is added to each 
number of the set, and each resulting number is 
then doubled, find the mean and standard 
deviation of the new set. (C Additional) 


3. Aset of values of a variable X has a mean pt and 
a standard deviation o. State the new value of the 
mean and of the standard deviation when each of 
the variables is (a) increased by &, (b) multiplied 
by p. Values of a new variable Y are obtained by 
using the formula Y = 3X + 5. Find the mean and 
the standard deviation of the set of values of Y. 

(C Additional) 


4. Show that the standard deviation of the integers 
1, 2, 3, 4, 5, 6, 7 is 2. 
Using this result find the standard deviation of 
the numbers 
{a) 101, 102, 103, 104, 105, 106, 107. 
(b) 100, 200, 300, 400, 500, 600, 700. 
(c) 2.01, 3.02, 4.03, 5.04, 6.05, 7.06, 8.07. 
{d) Write down seven integers which have 
mean 5 and standard deviation 6. 
(L Additional) 


The transformation for the examination paper is y= 1.25x— 27:5 


When x = 80, y = 1.25 x 80 - 27.5 =72.5 


‘Anna’s standardised mark for the examination is 72.5. 


» Tris proposed to convert a set of marks whose 
mobs ct and standard deviation is 4 to a set of 
fee he mean 61 and standard deviation 3. 
A ‘quation for the transformation necessary to 

vert the marks is y = ax + b. Find 


(a) the values of a and b, 

(b) the value of the scaled mark which 
corresponds to a mark of 64 in the original 
data, 

(c) the value in the original data if the scaled 
mark is 79. 


The marks of five students in a mathematics test 

were 27, 31, 35, 47, 50. 

(a) Calculate the mean mark and the standard 
deviation. 

(b) The marks are scaled so that the mean and 
standard deviation become 50 and 20 
respectively. Calculate, to the nearest whole 
number, the new marks corresponding to 


the original marks of 31 and 50, 
(C Additional) 


It is proposed to convert a set of values of a 
variable X, whose mean and standard deviation 
are 20 and 5 respectively, to a set of values of a 
variable Y whose mean and standard deviation 
are 42 and 8 respectively, If the conversion 
formula is ¥ = aX + 6, calculate the values of a 
and of b. (C Additional) 


In order to compare the performances of 
candidates in two schools a test was given. The 
mean mark at school A was 45, and the mean 
mark at school B was 31 with a standard 
deviation of 5. The marks of school A are scaled 
so that the mean and standard deviation are the 
same as school B and a mark of 85 at school A 
becomes 63. Find the values of a and b if the 
transformation used is y = ax + b. Find also the 
original standard deviation of the marks from 
school A. 


(c) Using the frequency table estimate the mean 


and standard deviation of the marks. 


Example 1.34 


9, The following is a set of 109 examination marks 


ordered for convenience. 
i {d) The marks are to be scaled linearly oy the ‘ 
47241363 1 17 18 20 relation Y=4 bX where X is the old mark 
23.24 «25 2S 75 25 26 26 and Y the new mark. The new siean 3% Use the coding y = x~-200000 
3g 28 28 29 39 29 30 31 standard deviation are to be 50 am 25 000 to find the mean and 
32 32 33 33. 34 34 35 36 respectively: Using your estimates in (C standard deviation of the followi 
37 37 37 «38 33 38 39 39 calculate suitable “values for a and 9- x = 125000 1500 Owing: 
39 39 39 39 40 40 40 40 00 175 000 7000 
41 41 41 42 42 42 42 43 410. The mean of the marks scored by candidates in f 5 19 00 225 000 250.00 
44 45 46 46 47 47 47 47 an examination jg 45. These marks are scale 27. 35 0 275 000 
50 St St 52 52 52 53 53 finearly to give a mean of 50 and a stan ar 24 p 
54 54 SS 57 58 58 59 59 64 62. deviation of 5, Given that the scaled matk 0 80 Solution 1.3 3 
63 64 66 66 67 70 76 77 82 corresponds to a original mark of 70, calculate 34 
{a)_ the standard deviation of the original marks, 
(a) Construct a grouped frequency distribution (b) the mark which is unchanged py the scaling. y= x — 200 000 
paths a class width of 10 and starting wit Given that the greatest and least scaled marks are se a 25 000 
: 5 92 and 2 respectively, calculate the 5 000y =x —- 
(b) Draw ¢ histogra®. and comment OF the corresponding original marks. (C Additional) ie. 7 _ x — 200 000 
shape of the distribution. x = 25 000y + 200 000 
%=25 0009 + 200 00 
0 and s,=25 
= 25 000s 
; y 
USING A METHOD OF CODING TO FIND THE MEAN AND STANDARD x ee 
DEVIATION 2500 | fy ft 
es 3 Lfy_ 23 
150 000 SS 2 “15 ae Sige acu 
Example 1.33 175 000 a -38 ie S 
; ; 200.000 27 -27 »_ 2h” 
0. 27 s 
Salt is packed in bags which the manufactures claims contain 25 kg each. Righty bags are 2 35 ke ae Fy 
' : 25.000 1 0 xf 
examined and the mass, * kg, of each is found. The results are Lx - 25) = 27.2, 20.000 24 5a 0 By 
ee 25)? = 85.1. Find the mean and the standard deviation of the masses. ys 12 24 2202 (20 p 
275.000 5 ; 24 48 125 184) 
9 = 1.942 
: 27. : 
Solution 1.33 Efe sy=1.3 
fo125 Bfy=-23 Upp? =247 2 


oding has been used to summarise the results. The 


You do not know the actual masses and a x 
= 25 000 x (-0.184 
: } + 200 000 


coding is ¥=* 7 2.5, where Dy = 27.2 and Ly? = 85.4 = 195 400 
>» Ly 
Therefore y= pee sy = = -¥ s,=25 000s, 
n =25 000x 1.3 
393s 
ie 85.1 9.347 = 34 840.207 ... 
380 80 = 34 800 (3 s.f.) 
, nc neee ra ae noon and standard deviation 34 800 (3 s.f.) 
Now ify =x ~ 25, then xayr2s In general, if Se : nee 
a eer ni seh - set of numbers x4, ee 6 F 
os Yu +s ¥, by means of the codi Nyy vey X, 18 anskorme cael ae 
Therefore x ee i 25 of the coding med to the set of mambers 
Also $y Sy 
so 5,2 0.9737 we 4 
Aen 


‘The mean mass is 25.34 ke and the standard deviation is 0.97 kg (2 d.p.). 


NOTE: The value 25 used here is sometimes knows as the assumed mean. 


Exercise 1) COC ing 
EXETCISS * Coding (a) when the data are discret 
4, Find the mean and the standard deviation of the Ti x R i (b) when the data are conti dung UngTOUP: ed — by drawin i 
following sets of data, using the coding ime (min) requency | drawing a cumulative eee or in the form of a gro = . a diagram, 
indicated: requel uped dis ote hat 
ae es oe re as u quency polygon or curve. crete distribution — by 
4 2 a 
“25 2 (a) Cumulative frequency — i 
230 6 Ree iar step diagrams for discrete ung 
=35 40 7 ows the numbe: roupe 
-40 4 particular test centre. of attempts needed to pass the driving t ped data 
we 5 = g test by 100 candidates at a 
50 4 umber of attempts 1 5 
= 2 4 Frequency 3 4 5 6 
450 Calculate the mean time taken to feed the (Numb: : 33 42 13 
(b) Interval y= a animals, using 4 method of coding. ot er of candidates) 6 4 3 
e cumulati Boal 5 
400<x'< 200 3 5, The table shows the times taken on ve frequency distribution is formed 
100 <x < 300 30 consecutive days for a coach to cormplete WNuihber of as follows: 
300 <x < 400 12 one journey on 4 particular route. Times have er of attempts <1 2 
x been given to the nearest minute. Find the meab Cumulative f <3 <4 Zz 
400<x'< 500 18 time for the journey and the standard deviation, ive: requency. 33 75 3 <6 
500. <x < 600 42 using a method of coding. 88 94 98 6, 
0) 
600 <x < 700 6 1 . 
cee | 33 +4 
(c) Toterval f peas 0.0225 Se total i 
Inte y=" 9,005 Plot the cumulative frequen : of ams 
0- 5 points. cy against the number of attempt ee 
0.005- 40 pts and decide how to join the 
0.01— 3 
0.015— 18 
as 3 6. Ina practical class students timed how long it 
0.02.5 took for a sample of their saliva to break down 
9.03- 6 a2% starch solution. ‘The times, tO the nearest 
0.035= 0 second are show jn the table below. Find the 
[oie mean time, using 4 method of coding. 
2. Fora particular set of data Time (seconds) Frequency. 
= = = = Re 
n= 100, 2-50) 423.5, 2-59) 238.4 ae. ee 


Find the meamt and the standard deviation of x. 24-30 2 
3, Find the variance of * if 31-40 _ 
s fle ~100)=127, Bf 400)? = 25935 41-50 AL 
Ef=20 51-60 8 
61-70 2 
4, Bach morning for a month the owner of 8 74-90 4 
smallholding ‘timed how long it took to feed the 
animals. The results were aS shown: 


CUMULATIVE FREQUENCY 


The cumulative frequency is the total frequency up toa particular item. A cumulative j 
ate! 


75 

People took < 

adele : seneseae : <2 attempts, 88 people took 

frequency distribution can be obtained from a frequency distribution and can be ilhustr’ ok <3 attempts. attempt, 


If you joi 
join the poi i 
a points with strai i A and der a cw 
‘hen £50, this te ight lines, such as from A to B, and con: ider a cumulative 
suggest that 50 people took < 4 4 ee i : nse. 
i < 1.4 attempts which is n 
Wuency of SQ, this would sense. 


a CONCISE 


60 A 


Clearly it is not sensible to join the po 


(and usually integet 
as shown: 


jscrete 
hen data are disc 
ss of steps 


joined by a series 


Cumulativ 


i au 
The step diagram 1S necessary bec 


intervals 1 to 2,2 to 3,- 


cumulative frequen 


" Seanad that when the data 


This means 
ie. the 50th person 
To find bow many took up to 
axis, to the top : 
‘This shows that 94 candi 
If you go to the bottom 9. : 
fewer than four attempts (8 


(b) 


tit only make 


Notice tha ee 


It would be silly to cons 


Note that in a step diagram, 
‘steepest’ Step- 


From the graph above, 


ints directly. 


e frequency graph 


_ but Gump” 
cy of 50. From the graph, the 


ook two attempts- 


le 
the step, then £° 
: dates took up to 


£ the step, this te 


5 sense when you 


the mode is give 


ints can be 
) values and also are ungroupeds the pou 


of number of attempts 


—t 


wal 
re 


ed evenly throughout the 


distribut' 
a n 2 to 3 and so on. 


re 
se the values 2 

or ‘step up’ from 1 to 2, the 
er of attempts is tWO- 


b . 
es 50th item 1s 2s 


are place in ascendins order or S ‘Ze. the 
i d di d f si 
S! 8 ? 


y from 4 on the horizontal 


four attempts, £0 UP vertical 


i is. 
ft to the cumulative frequency ax 


tS. . 
Jeera the nambet of candidates who took 


in this case). | : 
e values on the horizontal 4%! 


read from the discret 


attempts, for example. 
he variable that gives the 


n by the value of 


the mode is two. 


MIARY OF DATA 61 


(b) Cumulative frequency polygons and curves for grouped data 


Consider this situation: 


Six weeks after planting, the heights of 30 broad bean plants were measured and the 
frequency distribution formed as shown. 


Height, x.cm 3<x<6 65x <9 9<x<12 


1 2 =F . : 


exes 1S<x<18  18<x<21 


Frequency. 


The cumulative frequency is calculated up to each upper class boundary. 
‘The upper class boundaries are 6, 9, 12, 15, 18, 21. 


The lower boundary of the first class is 3. This is inserted for completeness. 


<3 <6 Ko <12 <15 <18 <2) 
29... 30 


Height, x cm 


Cumulative frequency 0 1 3 14 24 


total number 
of plants. 


This can be shown diagrammatically in a cumulative frequency graph. 


The cumulative frequencies are plotted against the upper class boundaries and the points are 
joined as follows: 
(i) for a cumulative frequency polygon, 
— join the points with straight lines, indicating that you are assuming that the readings 
are evenly distributed throughout the interval. This ties in with the fact that you draw 
horizontal lines at the tops of the blocks in a histogram. 


(ii) for a cumulative frequency curve, 
— join the points with a smooth curve. In this case you are assuming a distribution of 


readings throughout the interval which might not be even. 


Cumulative frequency curve to show the heights of 30 broad bean plants 


Cumulative frequency 


: 
116518 21 
Height (cm) 


Ig COURSE Ih 


Values can be estimated from the graph. Note that the graph can be read in either direction. Solution 1.35 


(i) To find the number of plants that were less than 10.5 cm tall: Frequ 2 
e Find the height 10.5 cm on the horizontal axis 7 quency = frequency density x width, so for 3 < ai 
e Drawa vertical line up from 10.5 to meet the curve Calculating the other f a ge< 5, f= 200 x 2 = 400 
e Drawa horizontal line to the cumulative frequency axis and read requencies gives 
the value ' Age 
10.5 3<x<5 See er 
From the graph, sever plants were Jess than 10.5 cm. tall Frequency 4 7<x<it <x e16 jee 
ti : 7 . (Number of pupils) 0 800 1800 x< 18 
ii) To find x where 90% of the plants were less than x cm tall: P. 2000 600 
e 90% of 30 =27 a7 
e@ Find 27 on the vertical axis and draw a horizontal line to meet (a) The cumulative frequency table i 
the curve ‘ 
e Drawa vertical line to the horizontal or ‘height’ axis and read : Age in years up to. 5 
the value 16.5 ang 5 7 ii 
lative frequency 0 400 16 18 
From the graph, 27 plants were less than 16.5 cm tall, so x= 16.5 1200 3000 $000 5600 


(b) Cumulative frequency polygon showing numbers i 
ers in various a} 
Example 1.35 Be groups 
A survey is carried out to determine the numbers of pupils in various age groups who are 


attending nurseries, schools and colleges within a certain area. The results are summarised in 
the following histogram. 


19) 5 10 15 


Age in years 


(a) Copy and complete the following table showing the ages of the pupils and the 


corresponding cumulative frequencies. 


Age itt years UP to, 3 5 7 1 16 18 


Faaae 
sien 


Cumulative frequency 0 5600 


(b) Draw a cumulative frequency diagram for the distribution. : 
(c) Use yout cumulative frequency diagram to estimate the age exceeded by 30% of the pupils = so 30% of th . 
jn the survey. (NEAB) ae ¢ pupils are older than 13 years 4 month 
onths. 


(c) If 30% 
% of the pupil: : 1 
Now 70 % pupils exceed a certain 
6 of 5600 = 3920 age, then 70% of pupils are 
younger than this a 
ge. 


From the graph, 3920 pupils have an age up to ears, 1.€. approx ears 4 months 
graph, 3920 i y' 
pup: ge up 13.3 y 9 IPP. 13 


jcular morning. A 


Example 1.36 
it took them to travel to college 07 4 part 


were asked how jong 


Students 
cumulative frequency distribution was formed: 
"Time taken (minutes) Cumulative frequency. 
<5. : 28 
<10 4S 
<15 81 
<20 443 
225 280 
<30 349 
<35 374 
<A0, 395 
<A5S 400 
(a) Draw a cumulative frequency polygon. 
(b) Estimate how many students took less than 18 minutes. 
al class intervals of 0-, 5-5 1O—, ve construct a frequency distribution and 


(c) Taking equ 
draw a histogram. 


Solution 1.36 


(a) Cumulative frequency poly 


s than 18 minutes. 


(c) Form the fre 
quenc distri : 
where frequency ae carr and calculate the frequency density f 
cae ui +i . j 
Note that the width of each avs ia width. y for each interval, 


ae Cumulative 
undary. fi 
: requency ‘Time (min) Frequen Frequency 
cy densi 
10 . 0- = oy 
= 5. 
1 81 a 45-28 = 17 . 
x 148 15. oe 72 
a 240 20 14381 62 pa 
20 349 = 280-143 = 137 a 
3s 374 So 349-280 69 bs 
40 395 ce 374-349 = 25 . 
2 400 a 395-374 = 21 L 
a 400-395= 5 : 
1 
Total = 400 
z 30 
5 20 
& 
10 
0 
o 5 
ia agi 30! <6 ad = a6 40 
45 
Time (minutes) 


CUM 
ULATIVE PERCENTAGE FREQUENCY DIAGRAMS 


ese are ic Ww wi € TL ONS 0 
particularly useful wh are 
Th " en two or more distributi to be compared For 
: : 
‘> SUPP we have the examination marks of 200 boys giris 11 re . 
exal mp! suppose 20) and 300 girls in Year 8 


Cumulative fr 
Mark ica Cumulative frequency 
<10 : (girls) 
<20 3 0 
<30 
<40 60 . 
250 140. Bh 
<60 T72: o 
£70 188 oe 
<80 6 56 
20 ee 246 
<100 200 504 


Obtain the cumulative percentage frequencies as follows: Boys 
ia Bays (total 2.00) Girls (total 300) y Cuimiulative % : - 
: F lative % Mark fr . ‘requenicy. 
Cumulative Cumulative % Cumulative yee Sy ° equency: Mark % Frequency: density: 
Mark frequency. frequency frequety : a 3%: O- 3% 0 
0. = 0% <20 11% i‘ 
<10 6 ao = 3% : eeeyy <30 sik ne 8% 0.8 
B= 11% 6 00 % 20- 199 
220 22 400 °. 12 24% <40 70% 30 9% 19 
<30 60 $y = 30% peer cy <50 ye . 40% 4.0 
140 70% 24 30 = 8 6% 40- 169 
240, 140 200 ° 42 “2. 14% 260 949%, 5 % 16 
£50 172. B= 86% 4 5 225% <70 98% vei 8% 0.8 
260 188 5 = 94% 20 _ 40% <80 985 be 4% 0.4 
196 = 98% 120 S00 = 40-7 9% 70— 19 
270. 196 200 ° 16 26 = 92% <90 100% 80 % 01 
280 498 3 = 99% 2 <100 100% e 1% 0.1 
290 300 200 — 100% 294 2 90- 0% 5 
400 200 = 100% 300 { 
<100 Total 100% / 
Cumulative percentage frequency curves Girls 
ohooh Cumulative % 
dog | me” x Mark f 2 Frequency 
Ff j requency Mark % Fi i 
& Boys H 10 ‘o Frequency, density | 
904 ! 3 0% 0- 0% ‘ 
: x a a ae 2% 0.2 | 
§ t 4% : 
& 804 20- 29 | 
att 8% 562 % 0.2 
i Girls <50 14% 40. 4% 0.4 : 
‘oO 
70 y ! : 6% 0.6: | 
| 25% 50-~ 11% ‘ ° 
: 40% 7 . 
604 i <80 wie . ie 15 
= 429 
; <90 98% 80- if : 42 
50 ! <100 100% 90- nies a0 
: ‘ 02 
04 * Total 100% 
' Boys’ results 
' Girls’ results 
1 
304 
' 
% ae | 
? 5 zp 4 
‘ = A 
’ 33 3 
x g g 37 
é é 3 
x 2 Ef 
é a i 
a | 
lon 60 go 300 1 | 
Mark h 14 
6 : ¢ the 
Great care must be taken when comparing these curves. A comme’ eae ie you ° 4 | 
the girls because the re) ai ‘ 30 40 50 60 r | 
boys have done better than girl b iB » fe: graph se ee : ae al see tha Papen 70 20 90 10 ‘ t 10 | 
percentage frequencies and draw the istog y ete 20 30 40 50 60 70 80 90 100 
Mark 


calculate the corresponding 
this is not the case- 


SOURSE IN Ae 


68 ACONC! 


+ than the boys. The modal class for the boys” 
marks is 70-79- The type of 


d that for the girls’ 


e done bette! 
dal class for the girl 
aid to be positively skewed an 


ee that the girls hav 
marks is 30-39, whereas the mo' 
distribution for the boys’ marks is s 
said to be negatively skewed. 


Tt is easy to § 
marks is 


MEDIAN, 


‘The median is an 
middle value. 
For a set of observatio 
through the distribution. 

re unaffected by extreme 
idea of the variability or s 


QUARTILES AND PERCENTILES 
s often described as the 


average that is unaffected by extreme values. It i 
ng arranged 19 order of size, the median is the value 50% of the way 


values are the quartiles and percentiles. These 


Other quantities that a 
are useful in giving a0 pread of the data. 


r of size: 


For data arranged i orde 
frrough the distribution, 


e the lower quartile Oy is the value 25% of the way t 
e the upper quartile Qs is the value 75% of the way through the distribution, 
e the ath percentile, p,, is the value 11% of the way through the distribution. 

Therefore the median (sometimes called Qy) is the 50th percentile, the lower quartile is the 
25th percentile and the upper quartile is the 75th percentile. The quartiles, together with the 


median, split the distribution into four equal parts. 


interquartile range 


between the quartiles, Q3- 
e middle 50% of the distribution a 


lower quartile = Q;~-Q: 


e range. It tells you 


the jnterquartil 
y extreme values. 


Q, is known as 
iso unaffected b 


The difference 
the range of th nd so is a 


Interquartile range > upper quartile ~ 


Interpercentile range 
ing the middle 


n be found. For example, the range giv 
h percentile, 


Ranges between. various percentiles ca 

30% of the readings is found by subtracting the 410th percentile from the 90t 
i.e. Pog Pro 

When finding the media 
grouped or ungrouped. 


nand percentiles, it is important to take note of whether the data are 


Ungrouped data - median 
ris the {lt + 4)th 


For ungrouped data consisting of n observations jn order of size, the mediat 

observation. 

(a) Consider this set of 
There are nine numbers, SO 
Sth observation. 
Arranging them in order gives 2, 2, 3,4, (7) 7,759) 31 


numbers: 7, 7,2, 3545 2,7, 95 31. 
the median is the (9 + 1)th observation, ie. the 


Sth observation 


The median is Ps 


b : i 
(b) SN see of numbers: 36, 41, 27, 32, 29, 39, 39 
e are eight numbers, so th raedian i hell +1 a 
ee 5 e median is the 4(8 + 1)th ob ion, i 
olathe s this does not exist, find the wale fe is h aimee ee inet 
sat een the 4th and Sth 
ers in order of size gi 
gives 27, 29, 32, |36, 39| 
G2os 025 - 39, 41, 43 


mere 4.5th observation 
edian is half-way between 36 and 39, so median = 4(36 + 39) 
i a =37.5 


Note that 
— if there is an odd 
Braces saan a aaa ae the median is the middle val 
he medline observations, there : ue, 
edian is halfway between them, i.e. Ke + a. two middle values. If these are c and d. 
. 3 


Ungrouped data — quartiles 


The q sartiles should divide in half the two distr ibutions either side of the median, r 
» 10! 


example: 
Ga Oe OO Oe QO 
= 
QO, (lower F I Q,= 
quartile) Op (median) ae Q,=19 


Interquartile range = Q,- Q,=19-5= 14 


{b) 20 @ eee @ 28 
T Q,=23 
Q QQ QQ ae aad 
a= 


Interquartile range = Q,- Q, = 27-23 =4 


(c) 14 150 1 = + = 
54 
7 | 158 | 159 162 | 164 165 Q,=3(150 + 154) = 152 
12 5 
O, : : QO, = 3(158 + 159) = 158.5 


Interquartile range = Q,—- Q, = 163 - 152=11 Q;=3(162 + 164) = 163 


Qy= 342 4+13)=12.5 


Q,=19 
Q3=4(24 +26) =25 


d 

(d) 10 12 | 13 15 @ 19 247 26 26 
QO ' | 
Qi Q, Q; 


Interquartile range = O,- Q,=25-12.5=1 
| 5 2.5 
Someti wing r the i x 
. oe the following rule is used to find th til 
artiles: 
Q) = 37 + 1)th value, Q3= 3(a + (th val 
value. 


This ru 
s rule agrees wi wi wi 
: with th 
ee ea 
pon ible et ane ae sia aecats i nis odd, but there is a discr h 
wever, make a great deal of diff i in eels 
erence which method i 
is used. 


Example 1.37 
A seaction time experiment was performed first with 21 gi 
results are shown on the stem and leaf diagram. 


ris and then with 24 boys- The 


Reaction times 


Key (Boys) 


Key (Girls) 
4|8 means 0.18'sec- 


6 | 1 means 0.16 sec. 


Girls Boys 
4 2 4 
3322 2 22 
100 2 goo 
99 @ 88 1 888 
776 1 6677 
55544 1 455 
1 2:3 
1 o11 
0 9 


Find the median and the interquartile range for both sets of reaction times. Comment on your 


answers. 


Solution 1.37 


For the girls: 

so the median js the 3(21 + 1)th value, ie. the 11th value. This value has 
nting up from the bottom, 14, 14, 15, 

. until the 11th value is reached. 


There are 21 girls, 
been ringed on the diagram and is obtained by cou! 
45,4.« OF counting down from the top, 24, 23, 23, 22, »» 


"The median is (0.19 seconds. 


Bind the quartiles by dividing in balf the two distributions either side of Qo- 
d 6th values. On this stem and 


So Q, is the 5.5th value. This is halfway between the Sth an 
leaf diagram, count from the bottom up, $° the Sth value is 0.15 and the 6th value is 0.16- 


Therefore Q; is 0.155 seconds. 


Q) is the 16.5th value. This is halfway between the 46th and 47th values, i.¢. between 0.21 


and 0.22. 


‘Therefore Q; is 0.215 seconds. 


The interquartile range = Q,-Qi= 0.215 - 0.155 = 0.06 seconds. 


For the boys: 


There are 24 boys, S° the median is the 3 
between the 42th and 43th values which are both 0.17 seconds. 


seconds. 


Q, is the 6.5th value. This is halfway between the 6th and 7th values which are 0.13 


and 0.14. 


So the median reaction time for the boys is 0.17 


Therefore Q, is 0.135 seconds. 


Q3 is the 18.5th value. This is halfway between the 18 d 19th values, w ch are both 
t i 18th and 1 > 


0.2 seconds. 


Therefore Q, is 0.2 seconds. 


The interquartile range = Q, — Q, = 0.2 — 0.135 = 0.065 d: 
.135 = 0.065 seconds 


Summary of results 


Girls Boys 
Medi 
edian : 0.19's 0.17 s 
Interquartile range 0.06's 0.065 
.065's 


4(24 + 1th value, i.e. the 42.5th value. This is halfway 


These results i 
confirm what the 
tem and leaf di 
slower than the bo s eaf diagram sho ; 
$ : ws, that 
ys to react, but that there is more pariability Ke eben an 
e boys’ results. 


To find the media i 
n and quartiles of data in the form of an ungrouped freq di 
requency distribution, 


it is useful to fin he cumulatr requency as this gives the total frequency up to a particular 
it ful to find t lative freq ‘y # | qi y up Pp 


Example 1.38 


The table shows the €! e family for 35 families in a certain area Find the 
; number of child i i 
: ren in the f y ilies i 
median number of children per famil > and the interquartile range. 
y. ange, 


Number of children 0 1 2 
Frequency (number of families) 3 5 12 ; : 
Solution 1.38 
The cumulative frequency distribution is formed as follows: 
Number of children 0 <1 7 
Cumulative frequency (fainilies) 3 8 5 : - : 
33 35 


Since there ar 

: e 35 values th dian i 

are eight families wi emecianls the 3(35 + 1)th value, i 

Therefore the m — <1 child and 20 families with <2 cuits einen py 
edian. 7 > va: 

Since n is odd: number of children per family is 2. pean eas 


adj 
= en + 1)th value = 9th value =2 
Q;=4(35 + 1)th value = 27th value = 3 


Therefore, j 
» interquartile ran; 
ge=3-2=1 child A 
per family. 


JURSE IN ALEY EL STAT 
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he cumulative frequency: 


Illustrating this on a ‘step’ diagram showing t 


8 on the 


alue, read across from 1 


e median, i.e. the 18th v 


(a) To find thi 
then down to 2. 


vertical axis, 
Median = 2 


(b) To find the lower quartile, i.e. the 9th value, read across from 9 on the 
vertical axis. 
Lower quartile = 2 


hh value, read across from 27 on 


(c) To find the upper quastile, je. the 27t 
the vertical axis. 
Upper quartile = 3 


Therefore interquartile range = 3-2=1 


Example 1.39 


14-year-olds how many no 


the cumulative frequency graph overleaf. 


ed her class of thirty 


‘A teacher ask 
Its are illustrated in 


term. The resu! 


(a) Write down the mode. 
(b) Find the median number 0 
(c) What percentage of the class rea 


£ novels read. 
dmore than 5 novels? 


vels they had read durin 


c : 
‘umulative frequency graph of the number of novels read 


Solution 1.39 


He ie nae step occurs when x = 3, so mode = 3 novels 
ere are 30 pupils, so the median is th 4 : 
» 2(3 i 
Sted between the 15th value and ie aia ope eo 
rom graph, 15th value=3, 16th value =4 , 


% . median = 3.5 novels 
c) From thi 
e graph the number that read <5 novels is 21. The 22nd person must h: d 
ust have rea' : 


9 
6 novels. 
2 

Number who read more than 5 novels = 30 — 21 =9 
a7 Percent é 
ntage that read more than 5 novels = 3g x 100% | 
= 30% ) 
3 _ | 


Exercise 1j 
cise 1} Cumulative fr 
j imulative fre ; 
siperousde dae frequency, median and quartiles ~ 


|. Find the medi 
ae nedian of each of the following sets of 
{a} 4,6, 18,2. 
> 6, 18, 25, 9, 16, 22, 5. 
2 192,217, 189, 210, 214, ig ‘ 
to) 126 7, 1836, 895, 3457, 2164 
-7, 0.4, 0.65, 0.78, 0.45, 0.32, 1.9, 0.0078 


g the 


2. 
inane shows the scores obtained when a die | 
rown 60 times. Find the median score. | 


Score; x. 1 2 3 4 z Z 


Frequency, f° 12 9 8 13 Feed 


3, These are the test marks of 14 students. 7, This cumulative sss ais Bee paws the Grou ped data d 
number of absences for each of a class © ! — medi ‘ 
se sae 54,38, 6,72 children during one tet ian and quartiles 
in . When d 
a) themedian Times only karts ave been grouped into intervals, the original i i 
(b) the lower quartile sexae OES zs. <6. <7 possible to make estimates of the medi : iginal information has been lost, so it i a 
(c)_ the upper quartile cumulative frequency graph. ledian and quartiles. One way of doi prise, is : 
(d)_ the interquartile range- Cumulative ph, or cumulative percentage frequency ey f ilk EE eee 
as foilo . { 
4, Find the median and interquartile range of the frequency |S 44 20. 232? 28. 34 2 . mye \ 
Lowi jstributions: 3 a | 
Sees ae (a) Find the median number of absences: 3 ~ 100% | 
1 | 05 Key 5 {2 means 52 (b) Find the range of the middle 50% of the z as “— } 
2 344 observations. 3 an 3 y ' 
31288 (c) Calculate the mean number of absences. s 5 he = le 
4115667 (d) Calculate the standard deviation. g. g ee ) 
5 2.3.3. ; : i sia Ss 2 / 
6 5788 g. Aresearcher, studying the effectiveness of Family 5 50% 4 ~ > / | 
7 24 Income Supplement, carried out a survey of ‘ / 
g | 0 420 families receiving the benefit. As part 0 the tn+ = Vv fi Vy | 
survey the researcher recorded the number 0} : ; poe ae Js ] 
(b) Stem Leaf Key 1|2 12. children in each family. The results are iMustrated 1 Vv! 
3 | 6 Neer in the cumulative fi h below 0 i 
S| a8 in the cumulative requency graph below. 1 Yt 
i= i 
9 7 s ° Q : 
2 0344 3 1 § Qs 
1 | 678899 Ey | 
1 22:2 733°4 
o| 55 | 
C 5 
0 | 133 Grouped data uur Cumulative percentage 
cy curve 
(c) Se fe 4 Key 22\1 means 33 Lowdt quariie: © : frequency curve 
1 | 1123 Medlin Oe Lore 25% readi 
14) 0228 Upper eae } nth reading 50% ng 
ig | 0233333 per quartile, Qs 2 nth reading e seacing 
2 | 3333 fecal 75% reading 
26 002 ‘umulative frequency curv 
30 | 13 Note that the 4(12 + aes Cumulative % 
not arrive at qe s a = ading is not used for the median ee ec Ne 
5, Find the median and interquartile range of each the top otthe $c si point on the cumulative frequency ai ae used this value you would 
ale as yo 1s when 
you would when you worked up from the aa ie a som 
of the scale. The 4th 
2! 


of the following frequency distributions. S 
or 50% value is needed for the median. 


a 5 6 1 8 9 10 
Note also, that i 
, that if preferred, a : 
poe dr is M8 goes frequency polygo , a cumulative frequency polygon . 
wiry greatly pa apa coos ar values obtained Pe de sa percentage 
obtaine an and quarti f 
(b) Pye a2 13 14 15 16 Z rom curves, quartiles will not 
xa | 
f 3 9 tt 15 7 Coa es mple 1.40 
, The table gi 
Number of children able gives th pak Ana 2 
6. The frequency table shows the number of goals . / e cumulative distribution of the heights (i . 
scored in netball by Jemima in 25 games played. (a) Write down the mode and the median of the 7 n centimetres) of 400 children in a 
number of children per family. Height (cm) <100 1 
Number of goals o£ 2s 3 4056 (b) Find the interquartile range of the number £1108 <120: <1300 2" <14 
of children per family. ; : Cumulative 0 Oo <190° -<160°° <170 
Frequency Oe a2 58 6 (c) Explain why the interquartile range '§ only 4 fréquence’ 27. 85 215 320 
rough measure of spread for this type EAB) y 370 395 400 
distribution. ( certain school: 


(a) Construct a cumulative frequency table. 
(b) Draw a step diagram to illustrate the table. 
{c) Find the median number of goals. 


(d) Find the interquartile range. 


{a) Dra C1 

ee . a cumulative frequency curve. 

a Saini of the median hei: ht 
mine the interquartile range igh 


76 A CONCISE COURSE IN 


Example 1.41 


0 to 90 percentile range. : The masses, measured to the nearest kilogram, of 50 boys are noted and a cumulative 


Determine the 1 : 
(d) Dete percentage frequency distribution formed. 


Solution 1.40 he heights of 400 children mass (kg) <59.5 <64.5 <69.5 <74.S <79.5 <84.5 <89.5 
* the heig) : 
(a) Cumulative frequency curve to show Cumulative 
: % frequency. 0 4 16 40 68 88 100 
= 400 7 < A _ 
8 4 Draw a cumulative percentage frequency curve and use it to estimate the median mass and the 
= 360 | interquartile range. 
g 
3 Solution 1.41 
& 300 +} 
3 Cumulative % frequency curve to show masses of 50 boys 


Cumulative % frequency 


100 110 120 iaignt tem) 
Mass (kg) 


b) Fox dhe median, find the MO a ae The median is the 50% reading. From the graph this is 76.3 kg. 
( From the graph, an estimate of the median is : The lower quartile, Q, is the 25% reading, so Q; = 71.5 kg. 


for the upper quastile, Q3, the 300th 


| ‘ Th F : ra ; f 
(c) For the lower quartile, Q1 find the 100th value an i 
value. The interquartile range = Q; - Q; = 80.5 - 71.5 =9 kg. 
‘heim Cige= oo It is interesting to note that if the data are represented by a histogram, the median divides the 
The interquartile range = . Ota ; inset 
= 16cm : = Histogram to show the masses of 50 boys 
Note that this is the range of the middle 50% of the readings. . 
the cs 
i hich is 10% of the way through : 
. itten, Pig) find the value w f th ronan 
Ore as cue ze. the 40th value. The 90th percentile is the joo - 


readings, the 9)(400) 
value, i.e. the 360th value. 


Poo = 447 cm, Py = 413 cm 


° oO 
-<«———- median: 


= Pog ~ Pig 

a t 
= 147-113 59.5 645 695 745 79.5 845 89.5 
= 34cm. Mass (ke) 


The 10 to 90 percentile range 


om per iN feLEVEL STAT 
se COURSE IN A LEVEL 


Example 1.42 


Examinat! ons if Eng ish, Mathematics and Science were taken by 400 students. Fach 
xamin: _ Hee meer out of 100 and the cumulative frequency graphs jilustrating the 
examinatio: as 


, 400 
= 400 \ 2 
g \ 3 
ry 8 | & 
3 8 i = 
g 3 \ 5 
£ F \ 3 200 
a & 0 = 
& 2 20 3 
= 5 
Ee a 
5 
é 


Science 
fish Mathematics 
Englis 


P ighest? 

: ‘ the median mark the hig he greatest? 

(a) In which subject was the I tile range of the marks the gre ‘ >(C 
ee uae a ae sa ocinately 75% of the students score 50 marks or more (c) 
{c) [In whic! subj 


Solution 1.42 


i j the diagrams: ; ‘ ‘aes 
Showing OM oouh readin tower quartile QO, is 400th, upper quartic Q; 
edian Qo 


eading. 

300th © g 5 400 
a z § 

z 400 5 3 

3 g 2 

‘g & 2 200 

af z 7 

2 200 § 

5 

é 


H 39 019283100 
Mark 


Hahah Mathematics 
nglis| 


7 : ; i English. 
The median, Q2» 18 the highest 1" Science. ‘ 
io The interquartile range, Qs ees Leone ane ie. 75% scored 50 or more is 
«sin which 300 students s > 
(c) The subject in W 
Science. 


Using linear interpolation 


it bl e! or other ercenti ce) ‘0 data without 
it is possible to i tiles for gt raped 
4 i stiles or th ni 4 : : 
is Pp 4 estimate the m dian, qua: P 
A wi hi miu ative frequency graph. The method is known as |inear inter polation. 
rawing the cu 


Example 1.43 


‘The ages of 160 members of a bridge club are grouped as shown in the table. 
Age 


50- 60~ 70- 
42. 61 37 15 


90- 


Number of members 5 


0 


Without drawing a cumulative frequency curve, estimate 
{a) the median age, 


(b) the number of members aged 67 or over, 
(c) the 20th percentile. 


Solution 1.43 


Form a cumulative frequency distribution. 


Age 


<40 <50 <60 


<70 <90 
Cumulative frequency 5 


47 108 145 


160 


(a) Since there are 160 observations, the median is the 80th observation. 
From the table, 47 are under 50 and 108 are under 60, so the 80th person has an age in 
the interval 50-60. 


50 years Median 60 years 


“a7 people oe 


~ “80 people 


108 — 47 = 61, so there are 61 people in the interv: 

80 — 47 = 33, so, assuming that the ages are evenly distributed, the median value will be 
33 of the way along the interval which has a width of ten years.’ 
«median = 50 + 23x 10 = 55.4 years 


al 50-60. 


(b) The age 67 is in the interval 60-70 which has a width of 10. 


67 is located 4 of the way through this interval. 


60 67 70 
+ 


era” 
108 people 


Ss 
145 people 
The number of people in the interval 60-67 is 145 - 108 = 37. 
Using linear interpolation, % of the 37 people will be under 67 years old 
Now 7 of 73 = 25.9 = 26 


: number of people under 67 years old = 108 + 26 = 134 
So number of people 67 or over = 160 - 134 = 26 


IRSE IN A-LEVEL STATISTICS 


go A CONC 


he distribution. ; 
ile i 0% of the way through ¢ 
oo ae vecgiies a the wae 0% of 160 = 32, so the age of the 32nd person 1s 
serval 


There are 160 ob: 


eded. 
Five people were under 40, 47 people w 


ere under 50, 50 the 32nd person is in the 


interval 40-50. Ch See 
The number of people in this interval = 47 . 
BH + “+ 
a people = 
32 people 
a 47 people 


b the interval 40-50. 
32-5 =27, 80% will be 33 of the way throug) 


= 40 + 24x10 = 464 ong 
The 20th percentile is 46.4 years (1 d.p.) 
___ Tener ee 


calls made from an 
he 80th percentile was 


Example 1.44 mber of telephone 
100 seconds and t 


rve, estimate 


5 of time of a large n¥ 
that the median was 
mulative frequency CU 


The distribution of the length: 

office in a given week was such 
190 seconds. Without drawing the cu 
a - pain: eis out of 500, that lasted less than a minute. 


ion 1.44 : 
ase denoting the upper quartile by Qs 


nformation on a diagram, 


ow the i 
7 . 400 seconds Qs 190 seconds 

+ ee 

7 i 

~ 50% i .| ! 

i 

- 75% \ ! 

a i a — 

80% 


of distribution in interval jou 
¢ distribution in interval 100-Q:3 1s 4) 


i 400-190. 
hrough the interval 5 
= 100 + 35% 90 = 175 


Percentage 
Percentage we : ‘ 
So QO; is 39 OFF e way 
This interval has a width of 90 so Q; 
‘The upper quartile is 175 seconds. 

100 seconds 


(b) 60 seconds ae 

t i 

— _ 
ey 

x% l : 

pos 50% :; 


% lasted less than 100 seconds. 


x% of calls fasted less than 60 seconds and 50 


Using ratios: 
are 60 


50 100 
The number of calls = 30% of 500 = 150 


150 calls lasted less than a minute. 


Eyercise 1k Cumulative frequency, median and quartiles - grouped data 


1. The table below shows the frequency distribution (c) 50% of the samples had a pH value greater 
of the masses of 52 women students at a college. than x. Find x. What name is given to this 
Measurements have been recorded to the value? 
nearest kilogram. (d) Taking equal class intervals of 4.4 <x < 4.8 


4.8 <x <5.2, etc., construct the frequency 
distribution and draw a histogram. Show 
the median on the histogram. 


Mass (kg) Frequency 


40-44 
45-49 3. Eggs laid at Hill Farm are weighed and the 
50-54 results grouped as shown: 


55-59 


60-64 Mass (g) Frequency 


65-69 
70-74 


(a) Construct a cumulative frequency table and 
draw a cumulative frequency curve. 
(b) How many students weighed less than 
57 kg? 
(c) How many students weighed more than 
61 kg? 
(d) 20% were heavier than x kg. 
Find the value of x. 
{e) Estimate the median. 
(f) Estimate the interquartile range. 


Construct a cumulative frequency table and draw 
a cumulative frequency curve. Use the curve to 
estimate the median mass. 


2. Fifty soil samples were collected in an area of 4. 
woodland, and the pH value for each sample was s 
found. The cumulative frequency distribution a 
was constructed as shown in the table. £ 
pH value Cumulative frequency. ia 
<AB 1 7 
<5. 2. 
<5.6 5 
<6.0: 10 
<6:4 49. 
<6.8 38 
<7. 43 
<7.6. 46 
<8.0 49. 
<8.4 50 
a vi pea eae ae j 4 
: rcentage of the samples had a 5 “10 16 20 25 30 35 40 
pH value less than 7? Time (minutes) 


The cumulative frequency curve has been. drawn 
from information about the amount of time 
spent by 50 people ina supermarket of 9 
particular day. 
(a) Construct the cumulative frequency table, 
taking boundaries <5, <10, 
(b) How many people spent between 17 and 
27 minutes in the supermarket? 
(c) 60% cof the people spent less than or equal 


to t minutes. Find ¢. 
(d) 60% of the people spent jonger than 
s minutes. Find s. 
(ce) Estimate the median. 


(f) Bind the interquartile range. 
y, the length of life, in 


hours, of 50 light bulbs is noted. 

The results are summarised i able. 
Using linear interpolation, 
(a) the median, 

(b) the interquartile range- 


calculate estimates of 


Length of life (b) Frequency 


650 <b < 670 3 

670 <b < 680 7 
680 <b. < 690 20 
690. <b < 700 17 
700 <h < 702 


6. A factory produces a certain component. The 
masses of 500 of these components were : 
measured to the nearest gram and are grouped in 


the following table. 


(a) Draw the cumulativ' 
{b) Use your curve to estimate th 


(c) Ina particular house it was found that the 


was turne! 


below 17 °C. Use your cur 
the number of weeks W 
was turned on. 
(d) A week is classi 
when the weekly maximum 
than 21°C. 
Use your curve 
weeks that are classified as 


3. The times, to the nearest min’ 
of 120 students to write a par 
recorded and are grouped in t 


to estimate thi 


e frequency curve. 


¢ median 


d on when the 


ve to estimate 
hen the heating 


fied as extremely warm 


js greater 


percentage of 
extremely warn. 
{C 


ute, taken by a group 
ticular essay, Were 
he table below. 


279° 80-84 85-89 


Mass (g) 60269. 70-74 75 
Number of 
‘components 38 93. 420 196 53, 


Without drawing 2 cumulative frequency curves 

estimate 

(a) the 60th percentile, 

(b) the number of components whose mass is 
Jess than 78 grams. 


m temperatures ina 
corded, to the nearest 
‘d of two years ant 


7. The weekly maximu 
certain town were Te 
degree Celsius, over a perio 


grouped in the following, table. 


‘Temperature (°C) Number of weeks 
=5 tot 8 
Oto 4 42 
5 to 9 47 
10 to. 14 3h 
15-to 19 23 
20 to24 9 
25 to 29 4 
LL 


Ten. 


Numbet 


ofstudents © 8 22 34 


Construct the cu! 
distribution am 


curve. 
Use your curve to estimate 


{a) 
over 62 minutes in writ 


Another group 
essay and all to 
complete it. 
Use your curve 
150 students. 


to estimate 


9, Bach of 50 sportsmen was 
distance, x km, he needs t 
access to suitable training 
are summarised in the t 


4950-54 55-59 6! 


(minutes) 40-44. 45 


mulative freq) 
d draw the cumu: 


(b)_ the percentage of these s' 


0-64 


30 26 


uency table for this 
Jative frequency 


f the times, 
tudents who spent 
ing the essay. 


of 30 students wrote the same 
‘ok over 65 minutes to 


the median time of all 
{C} 


asked to state the 


‘0 travel to obtain 


facilities. The results 


able below. 


Number of sportsmen 


Distance (x jm) 
O<x<4 4 
4acx<10 2. 
40<x<20 6 
20€&x<35 19. 
35 <x< 60 412 
60<x <100 10 


Constr’ 
distribution 2 
curve. 


act the cumulative frequency 
nd draw the cumulativ' 


table for the 
e frequency 


10. 


11. 


Use your curve to estimate 
m the median distance, 
: - interquartile range of the distances. 
e percentage of sportsmen who need to 
travel more than 30 km. {C) 


The prices, on a i 

particular day, of 53 st 
the London Stock Exchange aire achat 
the table below. sects 


3 Number of 
Price £x stocks. 
TS-<x<95 6 
95 <x <100 16 
100<x<105 43 
405<x<110 +3 
110 <x< 120 5 
120 <x <135 ‘ 


Construct the cumulative fi 
occa ve frequency table for this 
ae ion and draw the cumulative frequency 
Use your curve to estimate 
(a) the median price, 
i re interquartile range, 
c) the number of stock i a 
pre ret cks costing between aye 


The masses, measured 

to the nearest 
80 eggs were recorded and are grou ane 
table below. ee 


Mass {g) 50-59 


60-64. 65-69: 70-79 


Number 
of eggs 18 20 : 
y 


Assuming that the readings i 

A ; tha adings in each 

poner distributed and given that 60% of all 

; = eggs have actual masses below 66.5 
calculate the value of x and of y. es (C) 


. 30 speci 
pecimens of sheet steel are tested for tensile 


str i 
tei measured in kN m™®. The table belo’ 
s the distribution of the measurements as 


Tensile strength Number of specimens 
405-415 
415-425 ; 
425-435 Z 
435-445 0 
445-455 4 
455-465 2 


Draw 

‘aw a cumulative fi 

Goaaas re Reads 

distribution. equency diagram of this 


Esti 
Stimate the median and the 10th and 90th 


percentiles, 
(OC) 


13. 


14. 


1S; 


EB i 

ey a at 08:28 a train departs from one city 
ae tavels to a second city. The times taken for 
be solar ee recorded in minutes over a 
certain period and wer i 
aim e grouped as shown in the 


~80 0 
=85. é 
90. iD 
28, 2 
=100 31 
105 ts 
-110 7 
“115 4 
120 3 
125 ¥ 
Over.125 0 


(The interval ‘~90" indi i 
v tes all time: 
than 85 minutes up t snd i ae) ea 
pari up to and including 90 
From these figures draw 
a cumulati 
curve and from this curve sae aneet 
(a) the median time for the journey 
' ue interquartile range, : 
c e number of trains which arti 
b arrived 
second city between 10:00 and eheliad 
(C Additional) 


Two hundred and fi i 
cei ifty Army recruits have the 
Height (cm) No. of recruits: | 
165~ 18 
170= 37 
175~ F 60. 
180- 65. 
185-0 48 
190-195 22 


Plot the data in the fo 
f a cumulati 
frequency curve. Use ee mec 
z the i 

oo aneta one curve to estimate 
) ea ee quartile height. 
The tallest 0% of the recruit: 
into a special squad. Patnate: ee 
- median, 

the upper quartile of the hei 

members of this squad. ce aee 


aa distribution of the times taken when a 
: akin ae was performed by each of a large 
pee i hs people was such that its twentieth 
ile was 25 minutes, its forti i 
} ymin , its fortieth percentile 
higes ae its sixtieth percentile was 
inutes and its eightieth pe! i 
i Sr 
74 minutes. apa 
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Use linear interpolation to estimate (a) the E 3 
median of the distribution, (b) the upper quartile Amount raised, £ Number of children 
of the distribution, (c) the percentage of persons 155 70 
who performed the task in forty minutes or less. 36 
(NEAB) 6-10 

41-15 49 


16. The times, correct to the near 


est second, for 


st possible amount which may 


and 
(Cc Additional) 


Jcness, in millimetres, of the 


190 athletes to cover On& lap of a running track state the smalle 
were recorded and are shown in the table below. have beed raised by one child. Without drawing 
“| i imate the 
ime tee a ci aed curve, essimaie 
65-69 0 Also estimate the mean amount raised 
explain briefly why this is larger than the 
70-74 8 median. 
75-19 20 
30-84 25 48, Ina borehole the thicl 
35-89 34 25 strata are shown in the table. 
90-94 i. Thickness (mr) Number of strata 
95-99 


Draw a cumulative frequency graph and 


determine the interquartile range 


To qualify for an athletics meeting, @ runnet 
£78 seconds of 

under. Estimate the number of athletes who 

qualified and the median time for these 


needs to record a lap time © 


qualifiers. 


17, A group of 125 children raised money fos a 
jes. The amount 


raised by each child was recorded. These 


charity by sponsored activities. 


amounts, taken to the nearest £, 
the table below. 


SKEWNESS 


On page 20 you considered th 
There are mathematical ways 


Ina negatively-skewed 
distribution the tail of the 
distribution is pulled in the 

negative direction. 
mean < median < mode 


cI 


hence 


(C Additional) 


Draw a histogram to iltustrate t 
tive frequency table and draw 


Construct a cumula 
a cumulative freque' 
3 otherwise, estimate 
are grouped in 


than 28 mm thick. 


e shape of various distributions. 
of expressing the degree of ski 


Ina symmetrical distribution, 
mean = mode = median 


hese data. 


ncy polygon: Hence, oF 
the median and the 


interquartile range for these data. 
Find the proportion of the strata that are less 
( 


ewness of a distribution: 


Ina positively-skewed 
distribution the tail of the 
distribution is pulled in the 
positive direction. 


mode < median < mean 


@poI| 


ueIpen 
yee) 


OF DATA 85 


Notice that when tributions are skew, the median generaliy lies between the mode and the 
distributi a th d b he di dth 
mean, and the following relations ip is sat sfied 


mean — ew 3{ 
an — mode ~ 3(mean ~ median} 


One mi 
easure of skewness is given by Pearson’ ffici 
s coefficient of skewness 


Pearson’s coefficient of skewnes mean 


standar 
I 
: ae > mode, the skew is positive. 

ean < mode, the skew is negative 


Tf mean = mode, the skew is zero and the distribution is symme! rical. 


Alternatively 


Poa re, co 4, 4CL 
Pearson’s coefficient of skewness = 2oee Sissel 
d standard deviation 
ener: 
ally skewness can take any value between 3 and -3 


P 
or example, the measure oO. skewness for these distributions might be hown 
F le, th f th t t as shown: 


-15 
-0.03 
oot 42,3 


Example 1.45 


Electric fuses, nominally rated at 30 er y 1 duall 
A 
> ‘ : amperes (30A), are tested b passi g a gra ual y 
increasing electric current through them and recording the current, x amperes, at which they 
blow. The results of this test on a sample of 125 such fuses are shown in the Ae table. 
$ 


Current (x A) Number of fuses 
25 <x <28 6 
28<x%<29 12 
29.<x<30 27 
30.<x<31 30 
31 <x < 32 18 
32 <x <33 14 
33-<x<34 9 
34<x<35 4 
35 <x <40 2 


Draw a hi 
: histogram to represent these data 
‘or this sample calculate 


(a) ¢ i 
a fi median current, 
the mean current. 
, 


(c) th 
ic 
standard deviation of current. 


A measure of the skewness (or asymmetry) of a distribution is given by 


3(mean — median) 
standard deviation 


Calculate the value of this measure of skewness for the above data. 
Explain briefly how this skewness is apparent in. the shape of your histogram. (L) 


Solution 1.45 


Frequency density. 
Cc vieeyal width, Freawen 1 freaueney 
urrent interval wi requency = Saeval width 
yee x < 28 3 6 2 
WB Kx <29 1 12, 12 
29:6 x <30 1 27 27 
30ex"<31 1 30 30 
31<x%<32 4 18 18 
42 <x <33 1 14 44 
33<x<34 1 9 9 
34<x<35 1 4 4 
35<x<40 5 5 1 
Histogram to show the current at which fuses blow 
p 30 — 
3 fg 
20 
104 
0 + 
ry) 28 a a ca 36 433 «40 


Current (A) 


(a) For grouped data, the median is the nth value. 


Since there are 125 observations, 
interpolation as follows: 


the median is the 62.5th. This can be found by linear 


Since 45 fuses blew at a current less than 30 A and 75 fuses blew at a current less than 
31 A, the median lies in the interval, of width 1 A, from 30 A to 31 A. 


Median = 30+ 1 


x 1= 30.58 ... =30.6 A (1 dp.) 


30 
Mid-point (x) f x 
26.5 - gee 
xf 
28.5 a 
29.5 oF mE 
30.5 30 = ae re 
oe , 
ia ‘2 
33.5 x (c) s? ZF gp 
34.5 4 xf 
37.5 _ 119 905.25 
5 cieme t aa 30.8922 
Tfe 12s =4,926.., 
s =2.219... 


[Check these on your calculator, using SD mode.] 


Therefore the mean is 30.892 A and the standard deviation is 2.22 A (2 d -) 
. “Pp 


Cotte 

standard deviation 

_ 3(30.892 — 30.58...) 
2.219 3 


= 0,42 (2 dp.) 
Si 
nce skewness > 0, the distribution is positively skewed, 
ed. 


Note that the . 

. ; resulting frequen , 

right, i.e. positively ace cy polygon confirms that the distribution is skewed to th 
. ‘o the 


Frequency density 
N 
8 


10 
it) AN x ae 
0. 25 26 ——— : — 
23 2 34 36 38 A 
0 
Current (A) 


STATISTICS 


COURSE IN ACLEYEL S A 


Another measure of s 
quartile, QO» the median and OQ, 


Quartile coefficient of skewness =~ aan 


jles. Writing QO, for the lower 


Symmetrical distribution Positively skewed distribution Negatively skewed distribution 
Q, Q Qs Q, ® Q & 
Q3- Q2=O2.-Q1 Q3- Q2> Q2- 2 Q3- Q2<Q2-Q 
Quartile skewness < 0 


Quartile skewness = 0 


Example 1.46 
31 students tried to estimate the length of a fine. 
their results, in millimetres. 

61 70 46 44 26 23 30 83 52 
37 49 59 S8 63 31 29 37 48 
46 31 38 41 49 52 56 75 61 
Find the median and the quartiles of this d 


skewness. 
Draw a histogram with equal 


class intervals 20 
Solution 1.46 
Arrange the results in order. 


23 26 29 30 31 3) 
49 49 52 52 56 S8 


There are 31 results, so the median, 
So median = 48. 

To find the qua 
Q,= 431+ 1)th values = 8th value = 37 
Q,= 931+ 1)th values = 24th value = 61 


Now Qy- Qn = 61-48 = 13 
Q,-Q y= 48- 37211 


37 GD 38 


Q,, is the 


Since Q3 — 


Quartile skewness > 0 


istribution and use the quart 


en) 


ttiles, since # is odd (see page 69) 


Q.> O2~ Q1 the distribution is positi 


The line was actually 60 mm long. These are 


44 38 
76 61 


iles to estimate the 


<1 < 30, 30<1< 40, .. 


38 41 44 44 46 46 
61 61 63 70 75 76 83 


331+ 1)th value, ie. the 16th value. 


vely skewed. 


Quartile coefficient of skewness = 


Q,-OQ, 
13-11 
61-37 


ou = 0.083 ... 
This indicates a positive skew. 


The frequenc distribution is as shown be' ow, to; ether with the to; e € IV 
‘y ex his gram. Since each interval 


Length (mm) Frequency Frequency density. 

20<1<30 3 03 : 

30<1<40 7 0.7 poe 

40 <1<50 8 0.8 04 

50<1< 60. 5 0.5 é 

60<1<70 4 0.4 ad en ee | 

70 <1<80 07 | 

oe ; 0.3 0 20 30 40 50 60 70 80 90 ' 
0:1 Length (mm) 


The histogram confirms the positive skew. 


There is a special sy) e! tri oO! own as the normal distribution. This is 
mmetrical distribution known a 
‘ 
bell-shaped, centred around the mean 
ere are tw i wi fferer ndar VI ni 
H € two normal distributions with the same mean, but different standard deviations. 


THE NORMAL DISTRIBUTION 


i 

I 

4 
Mean 


There are tw istributi i 
o normal distributions with the same standard deviation but with different means. 


In a normal distribution: 


68% 


X-s xX xX4+5S x-2s Fa K+ 2: x 
oe 68% of ; See ee f hi ~~ 
istribution li ‘ oe Bea 
win oe oe distribution (nearly all!) 
pier her ee lies within three standard 
: deviations (3s) of the mean. 


Approximately 95% of 
the distribution lies within 
two standard deviations 
(2s) of the mean. 


The quastiles are 
approximately - 
2x standard deviation 


either side of the mean. 


The normal distribution is studied in g} 


t of skewness for 


> icien 
arson’s coefficie where 


1. Galevlate ies frequency distribution: 


the followin 
mean - mode 
skewness * Ta ndard deviation 


5 is 
+ crribution, the mean i 
ed distribu dard deviation is 5. 


skew! 
e nae js 20 and the stancar 


Calculate Pearson’s coefficient 0 
sketch the curve. 
wed distribution, 


iance is 16- 
and the varianct 
8 ¢ coefficient of s| 


3, For a ske 
mode is 7 e 
Calculate Pearson 
sketch the curve 


reater detail in Chapter 


16, the 


£ skewness ait 


the mean is 6, the 


ewness and 


7. 


7 he 
; ws the time, to tHe 
4, The following table Aine during 4 particular 


inute, spen' ‘ 
nearest minute, ' chool children. 


day by a grouP © S 


Time Number of children 
M 
10-19 5A 
20-24 ie 
25-29. ° 
30-39 . 
40-49 : 


50-64 
65-89 


se data by a histogram. 


a) Represent the: taped 


(b) Comment on the $ 


bank keeps @ 
riod of four years & A 
x our Peoid of the ee nS . 
re presente! or pa . 
omnes tie 300 accounting, weeks are as 
r 


follows. 


Number of 
weeks (f) 


5 
22 
46. 

38 

31 

23 


Number of cheques 
with erross (x) 


f the distribution: 


Construct a suitable pictorial representation of 
these data. 

State the modal value and calculate the median, 
mean and standard deviation of the number of 
cheques with errors in a week. 


Some textbooks measure the skewness (or 
asymmetry) of a distribution by 


3(mean — median) 
standard deviation 
and others measure it by 
(mean ~ mode) 
standard deviation 


Calculate and compare the values of these two 
measures of skewness for the above data. State 
how this skewness is reflected in the shape of 
your graph. (AEB) 


. Find Pearson’s coefficient of skewness for the 


distribution represented by this stem and leaf 
plot, which gives marks in an examination. 


Stem Leaf 


1 9 Key 3|7 means 37 


2 128 

3 37 

4 [5 

5 2557 

6 1166888 
7 | 355 

8 29 

9 1 


Cumulative frequency 


These are the three frequency curves associated 
with the cumulative frequency curves A, B, C 
above. Label each frequency curve with the 
appropriate letter. 


(a) 


8. The following table gives the blood pressure of 
60 students. 


Blood pressure Frequency 
95— 2 
105-— 5. 
110- 6 
115- 9 
120- 14 
125- 3 
130- 6 
135- 5 
140- 4 
150-180 6 
(a) Find 


(i) Pearson’s coefficient of skewness 
(ii) the quartile coefficient of skewness. 


(b) Draw the histogram. 


9, The following grouped frequency distribution 


summarises the time, to the nearest minute, spent 
waiting by a sample of patients in a doctor’s 
surgery. 


Waiting time 
{to the nearest minute) Number of patients 
3 or less : 6 
4-6 1s 
7-8 27 
9 49 
10 $2 
411-12 29 
13-15 13 
16 or more 9 


The mean of the times was 9.63 minutes and the 

standard deviation was 3.03 minutes. 

(a) Using interpolation, estimate the median 
and semi-interquartile range of these data. 
Semi-interquartile range = interquartile 
range +2 

For a normal distribution of the ratio of the 

semi-interquartile range to the standard 

deviation would be approximately 0.67. 


(b) Calculate the corresponding, value for the (b) 


above data. Comment on yout result. 

90% of times would 
be expected to lie in the interva 

d deviations). 


For a normal distribution, 


{mean = 4,645 standar 


40. Calculate the quartile coefficient of skewness for 
each of the following distributions: 


{a} 


cy 
Ms 
Q 
oe 


cumutative % frequen 


(c) Find the theoretical limits for these data. 


(d) Using appropriate percentiles, estimate 


comparable {imits. Comment on your result. 
(L 


Girls’ marks 


cumulative % frequency 


Cumulative percentage frequency 


$ Leaf 
() Stem | OO Key 5 |i means 6 
5 | 01 | 
g | 11222 
ne 100 
aye Mark 
17 ot 
20 | 4 
Box plot for girls’ marks 


e box plots can be drawn horizontall > as shown above, or vertically lik hii 
The b be d. tally. 5 Caily € this. 


Boys’ marks — Girls’ marks 


Box AND WHISKER DIAGRAMS (BOX PLOTS) oe 
Consider the cumulative percentage frequency curves for girls’ and boys’ marks, drawn on 2 
page 66. 1 
These are shown below, together with the median Q, and quartiles QO, Q3 Pe 
Below each diagram is a box and whisker diagram, OF box plot. 
Boys’ marks 4 
60 4 

» 1004 

% 804 

é 404 mo 

ec 404 

204 


0) 20 40 60 80 100 
Mark 


A Box plot for boys’ marks 


94 4 


read of the 
PAO, and Qs) and 


ates the dispersion, or S 


illustr: : 
il a, the quastiles 


Jot, 
ram, oF Box P jues of the dat: 


whisker diag 
Tt uses the highest and lowest va 


Os): For example: 


The box and 
distribution. 
the median 


Vertically 


~«<— Highest value 


nds from Q, to Qs 


s ? exte! 
box’ ex! ¢ 50% of the data. 


he middl 


nd from the box to the 
jues and illustrate the 


Notice that the 
and so encloses t 
<— Upper quartile Q3 


~<— Median 
-«— Lower quartile Qa 


-«— Lowest value 


Horizontally ly a 


: Highest value 


look like this: 


The ‘whiskers’ exte 
highest and lowest va 
range of the data. 


Lowest value 


‘A box plot for a s ical distribution would 


a 


f ‘, 
P\\ 


ymmetr 


rs are of equal 


The whiske equal 
length and the median is 0 
the middle of the box. 
For a positively skewed distribution: 
1\\ 
| \ 
| 
d 
ae The right-hand whisker is 
Jonger and the median is- 
nearer to the lower quartile. 


For a negatively skewed distribution: 


The left-hand whisker is 
longer and the median is 
nearer to the upper quartile. 


Example 1.47 


A class of pupils played a computer game which tested how quickly they reacted to a visual 
instruction to press a particular key. The computer measured their reaction times in tenths of 
a second and stored a record of the sex and reaction time of each pupil. Finally it displayed 
the following summary statistics for the whole class. 


Lower Upper 
Median quartile quartile Min Max 
Girls 10 8 15 6 19 
Boys 10 7. 13 4 16 


(a) Draw two box plots suitable for comparing the reaction times of boys and girls. 
(b) Write a brief comparison of the performance of boys and girls in this game. (NEAB) 


Solution 1.47 


a 
fa) Girls a = #5 
+— }—__— 
bows Q ® Q 
a ae) T T T T 7 T T 7 T T t T T 
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 


Reaction times (0.1 s) 


(b) The ‘typical’ (median) reaction time for boys and girls is the same (10 x 0.1 = 1 second). 
However the times for the boys are more evenly distributed, with a smaller range. There is 
a bigger spread of times for girls and their distribution is positively skewed. 


In general, the boys have the faster reaction time. 
2petremeereenewre- eeu 


96 & CONCISE COURSE INAS EVEL STATISTICS 


Example 1.48 
A group of children carry out a survey of the numbers of sweets in each of 50 packets. Their 
hown in the following stem. and leaf diagram. 


results are S 


Key 204 means 34 sweets in a packet. 


20 | 4 66 7 PRT TaE g 8 9 99 9 
30 | 0 0 0 0 0 4 12 ‘Pe 2: 33 3 4 4 4 
30 15 5 5 6 6 aa ae g 8 8 8 8 9 9 
40 | 0 0 41 2 4 4 
rtiles of this distribution. 
(NEAB) 


dian and the qual 


(a) Calculate the me 
for the distribution. 


(b) Draw a pox plot 


Solution 1.48 
so the median is the (50 + 4)th value, ie. the 25.5th value. 
jue which are 33 and 34. 


(a) There are 50 items, 
en the 25th and 26th va! 


This is half-way betwe 


So median, Q2 = 33.5 sweets. 
on 
\ 
ep gaa Te ace (13 values) 
Ue eee ee eee (45 values) 
30 ee Da ae Oe Pe (15 values) 
49 }0 0 14 4244 t (7 values) 
Q3 Qn 


= 29 sweets. 


50 Q, is the 13th value, i.e. QO, 
= 38 sweets. 


ms to the left of Ox 
0 Q3 is the 38th value, i.¢- O3 


There are 25 ite 
ms to the right of On 


There are 25 ite’ 
artiles divide in half the two distributions either side of the median. 


Remember that the qu 


Note that the patter is easy to Sec: 
a Qe Q3 
12 Items 12 items 12 items 12 items 
OO 
24 26. . -29 3030. . - 33 3435 38 3838... 44 


(b) Box plot 


Example 1.49 


A grou 
p of athletes fre 
A uen 
whisker plots below ieee is round a cross-country course i ii 
course. nt the times taken by athletes A, B S. aed The box and 
, B, Cand D to com 
plete the 


27 oe = ; 7 
30 31 35 = 
Time (minutes) 34 35 


(a) Compare the times taken by athletes C and D 


Assume th ‘ 
at the distributi 
: tions shi 
take in a race own above ar 
over the same c € representative of i 
ourse e of the times the 
: athletes would 


(b) i y' w ect 0. 
b Whic 1 of the athletes A o: would yo O' f 
, rB al uu choose if you were asked to sel 
y select one of them to 


i 0G 
(i) D? 


Gi 
ive a reason for each answer. 


(c) ath 
c) Which athlete would be most likely to win a race between A ar dB? AEB 
y e B 
( ) 


Solution 1.49 


(a) ae always faster than C. 
= ree are more variable than D’s. 
ot tated = positively skewed. 
re negatively skewed. 


(b) (i) B 
's median aver: masese 
prolable as age time is faster than C’s, but B’s ti 
Although eh ; , would win against C , but B’s times are more variable. It i 
's slowest time of a ees 
than C’s f. : e of approximatel: : 
astest time. A wi ely 32 minutes 
There . A will almost certai . ites appears to be sligh 
A has a nee A to win a race ae against C. ghtly greater 
a small chance of winni s 
: ; wins : 
Rg against D. ning against D, but B has a slightly gr 
ea 
(c) A’s Lo choose B to win against D greater chance of 
These ko is faster than B’s and A’s fe 
joose A to win a race between coer not as variable as B’s 
ber P| . 


(ii) 


Outliers 


Sometimes unusually high or lo 


w values occur in a set of data. 5, 
ite 0 
sual results but qui 

eason for these unt 
There may be good r 


corded. 
error was made when the data were recor 
would use the mean and sta’ 


they occur because an 


ndard deviation, OF the 


To investigate extreme values yo sie 
interquartile range (IQR). 


quartiles an e applied to data which are 


ie. ; 

ee ee ee. : « @ Jess than %— 25 OF greater than X + 2s, OF 
least 2. standard deviations from. the mean, i.€. tess 

{a) at leas 


the nearer quartile. 


a diagram gives: 


i ile range beyond 

(b) at least 14 * interquartile gi ‘ nee 
illustra’ 

artile range = Q3~ Oy, so illus’ | 

ae ay i Quiliers 


is 


e 


e115 X 
ag 15 K (03 ) 


1 
4 
| i 
| \ 
| i 
{ J | 
Q 3 ! 
! Boundary 
\ 


Boundary 


the month of July with 


erature for 
He em and leaf diagram. 


Example 1.50 . 
he maximum daily arene 


A class of 34 children recorded t 


i san and quartiles are 
the following results. The median — = 


1 
799 

Oo ee SO 
3 8 8 9 ? 
34444 


Identify any outliers 
: ; : 
ues of the quartiles and illustrating your Fe 


i 1 . . 
a ¢ ae te ea and the standard deviation. 


ults on a boxplot, 


Solution 1.50 
(a) The values ringed are 66,7 
=73° - 66°=7°F 
Interquartile range = QO3- oC : a ee 
Upper boundary = ae ee Te Ose age 


Lower boundary = 


E 2 =73°F. 
6 and 73, 80 Qi = 66°F: Q,= 70°F, QO; 


i inter © to 83.5°. 
Outliers therefore lie outside the interval 55.5° to 


It would appear that the temperature recorded as 94 °F is an outlier. (It was probably 2 
recorded wrongly, since it is most unusual to have just one day with an extremely high 


temperature.) The temperature of 57 °F, however is not an outlier. 


The whiskers are drawn down to 57 °F and up to 81 °F and the temperature of 94 °F is 


labelled as an outlier, as shown. 


Boxplot to show temperatures 


\ | 
%® & & J Busta) 
1 ; 
\ 
\ 

: 7 — ~ 

50 1 60 70 0 = 7 


(b) Using calculator in SD mode: 


X%=71,and s=7.11,s0 
X-2s=56.6, %+2s=85.3 


Since outliers lie outside these values, 57 °F is not an outlier, but 94 °F is an outlier. 


Exercise im f 


1. The table below gives the lengths, in minutes, of 
50 telephone calls from a school office. 


Length of Number of 
call (min) calls 
$ 8 
1-2 11 
2-3 7 
3-5 8 
5-10 6 
210 0 


(a) Draw a cumulative frequency polygon. 
b) Estimate the median and the quartiles. 


Draw a box plot and comment on the 
distribution. 


Two groups of peopie took part in a reaction- 
Uming experiment. Their results, to the nearest 
hundredth of a second, are shown below. 
Construct box plots to represent the 
distributions, and comment. 


Key 4|2 means Key 2|4 means 
24 hundredths 24 hundredths 
of a second of a second 

Group 1 Group 2 
66/2 
§44),2/45 
333322/2|22233 
11000/2)001 
8) 1/ 89999 
766|116677 
544;1)44 
1) 2 
1 
O19 


3. Twenty-one girls estimated the length of a line, 
in millimetres, The results were 


5i 45 31 43 97 16 18 23 34 35 35 
85 62 20 22 51 57 49 22 18 27 


Draw the box plot and use it to identify any 
outliers. 


(a) 


Thirty-one people completed a jigsaw in the 


following times {in minutes). a 
732 
41. 53 72 48 48 49 39 87 a 433 


53 65 4 
36 66 67 86 79 
aie ie me ae a ee Ae 


then 5. 
Draw cumulative frequency polygons ant sie 
"construct box and whisker diagrams to rep! 


following histograms. ese 
a each one, calculate Q3 ~ Q, and Q2 Q, 


Hi 78 7 i 
‘What do you notice? deviation: identify 
. d standard devia > 
nt: bes that Using the mean am 
Hint, remembe' any outliers. 
frequency ei buti £ marks 
ps! butions of ma 
frequency density = Jase width 6. The box plots show the distributs 


obtained by a class in English ane Tributions 4 
Mathematics. Commen 


marks. 


English 

marks. 
Mathematics 
marks 


Frequency density 


60 70 80 


sunshine jo December and July 


i th 
Find the median and quartiles for each mon! 
and construct the box plots. 


6| 5 means 
1: 4\6 means 
ee tu hours 6.5 hours 
(b) 
i ber july 
: Decembe! olen 
: 141334 
3 
10 
9 12288 
8 
T\34 nek 
016455 
Mn 115 |0028 4 
91\4 \1 3 
143 {55 
764433 2 \6 
9883600] ) ; : 
“| 937332000000 0 
& | , 
\ This back-to-back stemplot gives daily hours 


Comment on the distributions. 


ee 
Time (9) fe the times of the posta! delivery to 'y 


8. These ar : i 
house over four successlve weeks. 


: 915 9:29 
01 9:22 9:30 9:19 ; 
9.45 9:53 os ae i. a 
47 9:48 9:2 f : 
3.10 912 9:25 9:10 9:13 919 


Frequency density 


(a) Draw a stem and leaf diagram. 
(b) Find the median time. 


.) Find the quartiles. : 
Le abo and whisker diagram. 


Time (s} 


9, Draw box plots to represent the following 
frequency distributions. 


(a) 


7 8% 


10. A frequency diagram for a set of data is shown 
below. 


Frequency 


OrNWwEaAan 


0123 45 67 8 9 101112131415 


(a) Find the median and the mode of the data. 

(b) Given that the mean is 5.95 and the 
standard deviation is 2.58, explain why the 
value 15 may be regarded as an outlier. 

(c) Explain how you would treat the outlier if 
the diagram represents 
(i) the ages (in completed years) of 

children at a party, 
(ii) the sums of the scores obtained when 
throwing a pair of dice. 

(d) Find the median and the mode of the data 
after the outlier is removed. 

(ce) Without doing any calculations state what 
effect, if any, removing the outlier would 
have on the mean and on the standard 
deviation. 

(f} Does the diagram exhibit positive skewness, 
negative skewness or no skewness? 


How is the skewness affected by removing 
the outlier? (MEI) 


11. Ina test on the protein quality of a new strain of 
corn, a farmer fed 20 new born chicks with the 
new corn and observed how much weight they 


gained after three weeks. The results are given 
below. 


Weight gain (grams) 
360, 445, 403, 376, 434, 402, 397, 425, 407, 369 
462, 399, 427, 420, 410, 391, 430, 369, 410, 397 


(a) Make an ordered stem and leaf display of 
these data. 


The farmer also fed a further 20 new-born chicks 
on the standard strain of corn he had previously 
used and he recorded their weight gains after 


three weeks. The results for this control group 
are given in the ordered stem and leaf display in 
the table below. 
Weight gain (grams) | 
Unit is 1 gram i 
32 1 5 i 
33 I 
34 Sg , 
35 0 6 6 
36 0 1 6 | 
37 12020003 zs 
38 0. 6 | 
39 9 
40 2 | 
at 7 
42 13 | 
| 
(b) Ona single diagram draw two box-and- 
whisker plots, one for the weight gains of 


the chicks fed the new strain of corn and the 
other for the weight gains of the control 
group fed the standard strain of corn. 

(c) Use your box and whisker plots to compare 
and contrast the two distributions. (O) 


42, Arandom sample of 51 people was asked to 
record the number of miles they travelled by car 
in a given week. The distances, to the nearest 
mile, are shown below. 


63 44 47 57 68 81 


(a) Construct a stem and leaf diagram to 
represent these data. 
(b) Find the median and the quartiles of this 
distribution. 
(c) Draw a box plot to represent these data. 
(d) Give one advantage of using 
(i) astem and leaf diagram, 
(ii) a box plot, 


to illustrate data such as that given above. (L) 


» Vertical line graphs ~ ungrouped discrete data 


e Height represents frequency. 
« Mode is denoted by the tallest line. 


¢ Stem and leaf diagrams (stemplot) 


Key 2|7, means 27 


Stem Leaf 
1 045.9 » The stemplot must have a key. 
2 2235677 Intervals are 10-19, 20-29, 30-39, 40-49; 
3 422.7 8 50-59 
4 3-346 » Equal width intervals must be chosen. 
5 2:37 


» Histograms — grouped data, 
e Area frequency 
frequency, 


+ a rie 
e Frequency density =F hrerval width 


Modal class is represented by tallest 


Frequency density 


rectangle: 


® 


jnterval width = 


upper class poundary — lower class boundary: 


» Frequency polygons ~ grouped data 


e Plot frequency density against the 


Frequency density 


mid-interval value 


» Join with straight lines 


weighted mean = WX + 2%) t 


@. Pie charts 
my EL) e Area e frequency 


a ine compare sets of data with total 
requencies F, and F,, draw circles with 


radii in the ratio VF) NF;. 


radius ry. radius to 
2. Mean; X 
Raw data gol e Whi 
= en data are grouped, 
Frequency distributions... ¥ aks Hower bound re 
pgs er boundary + upper boundary) 


xf is taken to 
represent th 
Standard deviation, s P. e interval: 


Raw data 


Frequency distributions 


variance = s* 


s = Vvariance 
Scaling data 
If 


y =ax + b, where a and b are constants 
then y = ax 
y=ax+b and s,=|a|s, 


Coding data 
x-a@ 
fy=— then. x =a+ by 
x=a+ by 
s,=[b|s 


Combining sets of numbers, x and y 


new mean = ix + ky 
Ny +My 

: x2 2 

new variance = xi + By. 


= 2 
ny tn, (new mean) 


Weighted means 
eee 


+X, i i i 

,X, are given weightings w), w, ..., W, then 
ert 

EW, Ky LWiX; 


Fi, lw, 


W1+Wyt+-> 7 
i 


» Cumulative frequency is the total freque 


(a) Ungrouped data — step diagram. 


‘Cumulative frequency. 


ncy up toa particular observation. 


(b) Grouped data — cumulative 
frequency curve OF polygon. 


Cumulative frequency 


x 


steepest step denotes the 


© plot cumulative frequency against 


mode. 
upper class boundary- 
@ join with a curve (or with straight 


lines for a cumulative frequency 
polygon). 


e Median, quartiles and percentiles 


For 7 observations arranged in order of size 
is the value 50% of the way through the distribution, 
le, Qi, is the value 25% of the way through the distribution, 


of the way through the distribution, 


@ the median Or 
@ the lower quart 
, is the value 75% 


@ the upper quartile, Q3 
tue x% of the way through the distribution. 


@ the xth percentile, P,, is the va 


Ungrouped date Grouped data 


Q, Lin Ath value inth value = 5 0% value 


oO, Divides the distribution Sth value = 25% value 


either side of the 
O3 median in half 


3nth value = 75% value 


@ Ranges 
Range = highest value — lowest value 


Interquartile range = upper quartile ~ lower quartile = Q,-Q: 


Middle 80% of readings = Psy - Pio 


@ Skewness 


In a symmetrical distribution, 


mean = mode = median 


Q;-Q,=Q)- O01 


Ina positively-skewed 
distribution the tail of the 
distribution is pulled in the 
positive direction. 


mode < median < mean 


Q;- Q.>O,- Qi 


In a negatively-skewed 
distribution the tail of the 
distribution is pulled in the 
negative direction. 
mean < median < mode 


Q3;-O)<Q,-Q; 


Pearson’ et 
s coefficient of skewness mean — mode 3(mean — median) 


eons) 
standard deviation standard deviation 


Quartile coefficient of skewness = (Q;- QO») =(Q2- 1) 
Q; Q 1 


& re) i 
Box and whisker diagrams (boxplots) 
Symmetrical distribution 


value 


Positively skewed distribution 


Negatively skewed distribution 


aga rough guide: 


9: Outliers — 
= Points at least two standard deviations from the mean. 
imes the interquartile range above Q3 OF below O1- 


@ Points lying more than 1.5 ¢ 


Miscellaneous worked examples 


n successive days were 


Example 1.51 
The times t (in seconds 


to ran 400 metres on te 


) taken by an athlete 
53.7, 59:3 53:8. 


53.6, 56:8, 54.05 


53.2, 95-75 54.2, 52.75 

{te required, you may use Bt = 547.0, pt =29 957.48.) 

(a) Calculate the mean of the times. 

(b) Calculate the standard deviation of the times. 

(c) Determine the median of the times. (C) 


Solution 1.51 
au eles 54.7 seconds 


(a) mean f= 
n 
29 510 54.77 = 13.658 = 1.9 seconds (2. s-f.) 
he median is the 4 


re 
(b) s= J 7 P= 
n 
(c) There are 10 values, so t (0+ 4)th value = 5.5th value. 
Re-arranging the times in order of size 


52.7, 53.25 53.6, 53-75 53.8, 54.0, 54.2, 55.75 56.8, 59.3 
g.5th value 


The median is half way between 53.8 and 54.9, ie. 
53.8 + 54.0) = 53.9 seconds 


median = 4( 


Example 1.52 


ob are required to take a test of manua 
he task by 19 applicants were as follows: 
72, 59, 7A, 61; 82, 48, 7 


Applicants for an assembly j 
en to complete t 


seconds, tak 
63,229, 165, 775 49s 74 67 59, 66, 102, 81, 


For these data find 


0, 86. 


(a) the median, 


(b) the upper and lower quartiles. 


{ dexterity. The times, in 


An outlier here is defi 
efined as any ob: ; 
OQ; +1.5(Q3- ay observation less than Qy — 
3— Q,), where OQ, is the lower quartile ies ue fr 4) OF aia than 
quartile. 


a ene ay outliers in the data 
ustrate the data b ‘whi 
he ee sie 4 a box and whisker plot. Outliers, if a 
ot be included in the whiskers. em emt 
(AEB) 


Solution 1.52 
Arranging the data in order 
48 49 59 $9 |61 
- 63 66 67 70 [72] 74 74 77 81 (82) 86 106 
(a) Q, = Hat 1th en 419+ 1)th . _ 
=4(19 + 1th item = 10th item = 
Pee oe Oth item = 72 seconds 
(c) a eae tir 
1~ 15(Q3~ Qi)= 61-1 
-1.5x21=29.5 
O, + 1.5(Q3- Qy) = 82+ 1.5 x 21 = 113.5 


So outliers are less th 
an 29.5 or gr 
Therefore the outliers are 165 and 229. San 


{ 
d) Box and whisker plot to show times taken to complete the task 


Q & A% 


Time in seconds. 


Example 1.53 


Whig and P 
enn, solicitors, moni : 
of 120 of their cli , monitored the time s 
eir clients. The ti pent on consultations wi 
times, to the nearest minute, are summa: s ee a ane sample 
rised in the followin, 
& 


table. 
Time Number of clients 
10-14 
15-19 : 
20-24 17 
25-29 33 
30-34 27 
35-44 25 
45-59 7 
60-89 3 
90-119 1 
Total 120 


{a) By calculation, ob he di d f this distribution 
on, obtain estimates of the median a uartiles OF this 1 
> f J nd q 


(b) Co: 
mme 
at on the skewness of the distributio 
n. 
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(c) Explain briefly why these data are consistent with the distribution of times you might 


expect in this situation. 
hese 


(d) Calculate estimates of the mean and variance of the population of times from which t 


data were obtained. 


tors are undecided whethe: 
o summarise these data. 


The solici + to use the median and quartiles, or the mean and 


standard deviation t 
them to use. 


client was 12 minutes and the longest time was 116 
a. Use graph paper and show your scale 


(e) State, giving a reason, which you would recommend 
(f) Given that the least time spent with a 
minutes, draw @ box plot to represent these dat 


clearly. 
with a random sample 


of solicitors monitored the times spent 
s, the longest 


Law and Court, another group 

of their clients. They found that the least time spent with a client was 20 minute 

time was 40 minutes and the quartiles were 24, 30 and 36 minutes respectively. 
ent these data. 


per and the same scale draw a box plot to repres 


) Using the same graph pa 


(g 
(h) Compare and contrast the two box plots. 
Solution 1.53 
(a) ce Cumulative 
Time Frequency 
For grouped continuous data, with 2 = 120 
< 145 2 
219.5 7 Q, is the 4uth value, i.e. the 30th value 
224.5 24 Q, is the ynth value, i.e. the 60th value 
<29.5 S7 Q, is the jnth value, ie. the 90th value 
< 34.5 84 
ene) 109 
< 59.5 416 
< 89.5 449. 
< 119.5 420 
L 
24.5 Qi 29.5 
iad oF Q, lies in the interval 24.5-29.5 (width 5)- 
a 
24 There are 33 items in this interval, 
_ 
BOs Se ee so Q, 224.54 Hx 5= 25-4 min 
87 
295 & 34.5 
+ + ee Q, lies in the interval 29,5-34.5 (width 5). 
Pine ee 
87 There are 27 items in this interval, 
I 
CO = so 2295 eae 5 = 30 


(L) 


OF DATA 109 


348 Qs 44.5 
- Q; lies in the interval 34.5-44.5 (width 10) 
Bee There are 25 items in this interval. 
- & so Q, = 34.5+ &x 10 = 36.9 min 


(b) are 6.9 min, Q, - Q,; =4.6 min 
0 O,- geo ks 
3—- Qo > Q, - Qj. This implies a positive skew. 


{c} vey few consultations take over an hour. 
ost take just under half an hour. : 


(d) j 
Mid-point x f 
= ; Using the calculator: 
G : X = 32.6 (1 d.p.) 
ss a s* = 160.7 (1 d.p.) 


e) Iti 
(e) a8 better to use the median and quartiles 
cause the distribution is skewed. 


Whig and Pern 


Lawiand ‘coun 


(s) 
| 
| 
[| 
| | 
LI 
(h) Both i 
oth have a small interquartile range i.e. small variabilit 
. i 
Both have the same median (30 min) = 


Whig and Penr es have a much greater range of val ‘o the Law and Court would 
times 
ffici eT e 8 
ues, Si h 


1. In 1798 the English sei 


Miscellaneous exercise 1n 


entist Henry Cavendish 
measured the specific gravity of the earth by 
careful work with a torsion balance. He obtained 


(b} Another set of 10 nu 
sum is 
is 2380. This set is comb’ 
original 20 numbers. Cal 


mbers is such that their 


430 and the sum of their squares 
ined with the 


culate the mean 


0 numbers. 


the 29 measurements given below. 
and standard deviation of all 3 
4.07 4.88 5.10 5.26 5.27 5.29 5.29 5.30 (Cc Additional) 
5.34 5.34 5.36 9.39 5.42 5.44 5.46 5.47 eee 
550 5.53 5.55 5.57 5.58 561 5.62 5.63 4. A grouped frequency distribution of the ages of 
; ‘ 358 employees in @ factory is shown 1D the table 
to the nearest month, the mean 


5.65 5.75 5.79 5.85 5.86 


The sum of these measurements is 457.17 and 


the sum of their squares is 855.0227. 

(a) Calculate the mean measurement and the 
standard deviation of the measurements. 
Obtain for these data the range, the median 
and the quartiles. 
Draw a box plot and use it to identify the 
outlier in Cavendish’s data set. 

(b) If the data were analysed without this 

outlier, calculate the new values © 

(i) the median 

(ii) the mean 


(iii) the standard deviation. (NEAB) 


2, The table shows the distribution of the lifetimes 
the nearest hour) of a sample ©! 


measured to 
batteries. 

Lifetime (to nearest hour) Frequency 
690-709 3 
TA0-719 7 
720-729 15: 
730-739 38 

740-744 AL 
TAS-749. 35 
750-754 21 
755-759 16 
760-769 14 
770-789 10 


(a) Drawa histogram to yepresent the data. 


(b) Draw a cumulative frequency polygon. 
c) Calculate the mean and standard deviation. 


(d) Estimate the median and the quartiles. 
{e) Calculate Pearson’s coefficient of skewness. 
fficient © 


(f) Calculate the quartile coe! 


skewness. 
Draw a box and whisker diagram to 


illustrate the distribution. 


numbers is 320 and the sum of 
5840, Calculate the mean of the 


the standard deviation. 
-¢ added to these 20 so that 


ow that the 


3. The sum of 20 
their squares is 
20 numbers and 


below. Estimate, 
and the standard deviation of th 


employees. 
Graphically, or otherwise, 
dian and the interquartile range 


(a) the me ia: 
ages, each to the nearest month, 


{b) the percentages to one deci 
employees who 


under 55 years old. 


estimate 


are over 27 year’ 


5, 200 candidates sat an examin 
distribution was obtained as S| 


(a) 


If the limits of class 40—4' 


Age Number of 
{last birthday) employees 
16-20 36 
21-25 56 
26-30 58 
31-35 $2 
36-40 46 
41-45 38 
46-50 36 
51-60 36 
61- 0 


e ages of these 


of the 


mal place, of the 
s old an 


(L} 


ation and the 
hown in the table. 
9 are 39.5 to 49.5, 


i | value of this class? 


what is the mid-interva 
Calculate the mean 0 
any imitatio 


Plot a cumulative frequency © 
d lower 


to estimate the uppet an 
Assuming tbat your estim: 
values for a and b correct 


figures, in 


scaled by the equation y = 4% * 
‘he mean becomes 4 


the new mark, so that t 
and the lower quar’ 
State, with reason, W 
the original m: 


f the marks explaining 


ns of yout calculation. 
urve and use it 


quartiles. 


ates are exact, fin 
to two significant 


tder that the above 


quartiles of the scaled marks. 


marks can be 


b, where y 18 


tile becomes 35- 
Aether the quartiles of 


arks will scale into the 


90-99 


Marks (x) Frequency 
10-19 10 
20-29 18 
30-39 20 
40-49 30 
50-59 49 
60-69 46 

70-79 20. 
80-89 5 
2 


6A 
school entered $8 students for an examination. 


The results ‘ 
of the examinati . 
table below. ation are shown in the 


Mark (x) Frequency 
O<x<10 3 
10<x<20 6 
20 <x < 30 9 
30<x< 40 10 
40<x<50 2 
50<x<60 18 
60<x<70 14 
70<x< 80 11 
80-<x< 90 § 


{a) Calcula i i 
= te, showing your working and giving 
your answers correct to two decimal ph: 
an estimate of ai 
(i) the mean mark, 
(ii) the variance, 
(iii) the standard deviation. 


(b) Copy and complete the following 
cumulative frequency table. 


— 


Mark (x) Cumulative frequency 


x<10 
x<20 
x<30 
x<40 
x<50 
x<60 
x<70 
x<80 
x<90 


ic) Usi 
{c) . sing 2cm to represent 10 marks on the 
irdeto axis and 2 cm to represent 
digas on the vertical axis, draw, on 
grap paper, a cumulative frequency , 
polygon to illustrate the distribution of th 
(di oe marks, ° 
Jse your graph to estimate 
(i) the median mark. 
I > 
a i) the interquartile range. 
i west mark required to obtain a grade Ai 
i Senmarien. was 75. acing 
onen i 
eee San your graph the number of 
; who were awarde: 
this examination. eee aa 
) 


« Data 

atte ies collected from a survey of 150 

Se Sollee esi Their daily 

n mid-da’ i i 

ts chearoe y meals is summarised in 

strate ‘ 
Rai the data by means of a cumulative 
lee Y graph. Hence estimate the medi 

he daily expenditure 7 (Cc 
: ) 


peta 111 


Daily expenditure (£) Frequency 
0.66-0.90 1 
0.91-1.15 28 
1.16-1,30 38 
1.31-1.45 34 
1.46-1.70 27. 
1.71-2.00 12 

he hourly wages, £x, of the 15 workers in a 


small factory are as follows: 


£6.60, £3.40, £6.45. 

, £3.40, £6.45, £5.20, £3.6 

£7.25 £9.60, £3.75, £4.20, ay 
78, £4.50, £3.95, £4.75, £12.25, 


(a) ena the data in a stem and leaf 
iagram, using pounds for the stem 
pence for the leaves. Clearly ale 
7 median wage. State the range. 
(b) ties ee = 90,00 and Ex? = 631.25 
calculate the mean and standard d iatio 
of hourly wages of the workers. ies 


After delicate w: i 
age negotiations, the worke: 
J 18 
offered a choice of one of the following pay a 


(A) an increase of 30 
enc 
(B) a 5% rise in heurly ei ie 


(c) Use your answers in 
part (b) to ded 
npr ere deviation of Agri 
7 nee the 15 workers under both 
d) Explain why the mana; 
‘ A gement would 
mind which scheme was Dcldeicnted. bul 


the workers might. (MEI) 
The table below sh istributi 
pebbles from the bed ee nny 
Length, x millimetres Frequency 
O<x<5 10 
S<x<10 8 
10<x < 20 12 
20<x<50 25 
50<x<100 30 


(a) You are gi 
given that the frequency densi 
len: 
i class O<x <5 is 10. Write doe ms e 
requency densities for the other classes 


a ity the data in a histogram 
se your histogram, or otherwise, to 


estimate the modal length of a pebble. 


(d) Calculate estimates of 


(i) the mean length of a 
i) : pebble, 
(ii) the median length of a pebble. (O) 
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40. An advertising campaign to promote electric 
hich includes a 


showers consists of a mailshot w’ 
rd requesting further details. 


Sales completed 


Pandora 


Muruvet 


Magnus 


Jemma . 
Gideon 


Potential customers contacted 
Muruvet 


Jemma 


Pandora 


Gideon 


Magnus 


(a) The total number of potential customess 
contacted is 1100. Find, approximately; the 
total number of sales completed. 

(b} Describe the main features of the data 
revealed by the pic charts. 

(c) The manager wishes to compare the sales 
staff according © the number of sales 
completed. What type of diagram woul 
you recommend, in place of a pie chart, so 


that this comparison could be made casily? 
(AEB) 


41. As examination is taken by two sets of 


candidates from the same school. The number of 


candidates in each set, the mean marks and the 
variances are shown below. 


Calculate the mean mark for all SO candidates 


and show that the standard deviation of a 


50 marks is 9. 
ed that the original marks of the 


It is suggest 
candidates from Set A should be {nearly scale 


go that their scaled marks would have a stan ard 
deviation of 9 and a mean mark equal to the 
mean mark of all 50 candidates. 


{a) What effect would this have 
mark of 60 obtained by a candidate from 


on an original 


Set A? 
{b) Given that the original marks of the 
candidates in Set Awere all integers, explain 


why no mark would remain unchanged. 
(Cc Additional) 


12. Machine A is set to cut lengths of wood 100 mm 
tong. To test the accuracy of the machine, @ 
random sample is taken from the output The 

denoted by 7 and the length in 


sample size is 
millimetres of each piece of wood is denoted 


by x. The results are summarised by 
n=50, LX= 5035, yx? = 507 033. 
Calculate the mean and standard deviation of the 
Jengths in the sample, giving YOUr answers 
correct to one decimal place. 

Machine B is also set to cut lengths of wood 

100 mm long. A random sample of 50 items 
from this machine has mean 400.2 mm an 
standard deviation 1.1 mm. Givi 
comment briefly on the accuracy of the two 


machines. 


43. A frequency diagram for a set of data is shown in 
the figure. No scale is given OD the frequency 
axis, but summary statistics are given for the 

distribution: 
pf=50, Zfe= 100, E fat = 344. 


Weta. ae, 8. eS 


(a) State the mode and the mid-range value of 
tbe data. 


Number of 
candidates Mean mark Mariance 
Sev: A 20: 66 9 


54 39 


{b) Identify two features of the distribution. 
(c} Calculate the meab and standard deviation 
of the data and explain why the value 8, 
which occurs just ‘once, may be regar 


an outlier. 


(d) Explain how 
ou re i 
te ace SS regard the outlier if 
(A) te difference of the scores obtained 
aon en throwing a pair of ordinary dice. 
the number of children per household ; 
teal in a neighbourhood survey. i 
‘ . ole new values for the mean and 
ndard deviation if the single outlier i 
removed. ore 
(MEI) 


14. Data collected from a survey of the cost of 


4320 houses i 
sin a town ar Beoeee 
cable below. n are summarised in the 


Gost (£) Number of houses 
20.001-50 000 
54 
50 001-60 000 is 
60 001—70 000. 1320 
70. 001-100 000 860 
100 001-150 000 450 


0. : 

ae graph oe illustrate the data by means of 

to Simate he media peck prea it eeaph 

ae ian cost and the interquartile 
(C) 


15. istributi 
5. The age distribution of the applicants for a jo! 


recorded in the table below. ca 


Age (years) 20- 35— 40- 45- 50 60: 
Number of : 
applicants 14. 120—~«7 8 9 

0 


as 


16. 


Lara a paper, a histogram to represent 
Estimate 
: re nes of the distribution, 

ian age of the applicants who are 


less than 50 years of age. ({C) 
The cumulative fre 
tive frequency table bel 
saa et arg in minutes, of 400 dene t 
a certai i 
rane pete ain household during a period 
Length of call 
in minutes Number of calls 
<i 20 
<2 67 
<2} 118 
: : 177 
315 
<10 400. 


Construct thi 

e cortesponding fi 
— : ponding frequen 
ee to illustrate the ae ee 
isd 3 rere to estimate the median 
Aas ae pear the geometrical 

E a vertical line draw: 
histogram at this value. . pee oa 

‘ional 


17. The figure show 
sa cumulati 
ative frequency curve for the length of telephone calls f 
calls from my house durin 
g the 


first six months of last year. 


160 


Cumulative frequency 


120 


80 


Time in minutes 
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(a) Find the median and inter-quastile range. 

(b) Construct a histogram. with six equal 
intervals to illustrate the data. 

(c) Use the frequency distribution associated 
with your histogram to estimate the mean 
length of call. 

{d) State whether each of the following js true 
or false 


Mixed test 1A 


(A) the distribution of these call times is 
negatively skewed, 

(B) the majority of the calls last longer than 
6 minutes, 

(C) the majority of the calls last between 
5 and 10 minutes, 


(D) the majority of th 


e calls are shorter than 
the mean length. (MEI 


4, One hundred runners competed in a 
half-marathon race. The table below shows N, 
the number of runners who completed the course 


within T minutes of the start. 


T 65 85 95 105). 145 155 


N 0 25 53 73 99 =: 100 


Construct the corresponding frequency table and 


use this 
(a) to draw a histogram to represent the data, 


(b) to estimate the mean value of T. 


2. The following stem and leaf diagram summarises 


the blood glucose Jevel, in mmol/l, of a patient, 
measured daily over a period of time. 
Blood glucose level 5|0 means 5.0 Totals 
5 01312233344 (12) 
5 556678 899 { 9) 
6 0111234444 (10) 
6 556789 9 { ) 
7 412223 ( ) 
7 5.7 2. Ga) 
8 411223 34 tia 
g | 799 (3) 
9 | 0112 (4) 
9 | 579 (3) 


{a) Write down the numbers required to 
complete the stem and leaf diagram. 

(b) Find the median and quartiles ‘of these data. 

{c) On graph paper, construct a box plot to 

represent these data. Show your sca’e 

clearly. 

(d) Comment on the skewness of the 

distribution. 


{L) 


3. The 30 members of the Darton town orchestra 
each recorded the amount of individual practice, 
xc hours, they did in the first week of June. The 

results are summasised as follows: 

Ex = 225, ex? =1755. 

The mean and standard deviation of the number 
of hours of practice undertaken by the members 
of the Darton orchestra in this week were # an 
o respectively. 

(a) Find pu. 

(b) Find o. 


Two new people joined the orchestra and the 

number of hours of individual practice they did 

in the first week of June were #~ 20 and p+ 20. 

(c) State, giving your reasons, whether the effect 
of including these two members was to 
increase, decrease OF leave unchanged the 
mean and standard deviation. 


_ A newsagent carried out a survey to gather 
general information about her customers, and 
the readability of her magazines. Table 1 shows 2 

classification of the customers during one hour 
of trading and Table 2. shows the number of 
words per sentence for a sample of 100 sentences 


taken from a magazine. 


Table 1 


Child/ Adult Adult 
Student female male 


Number of 


customers 5 22 


28 


Table 2 
Words pet 
sentence 4-5): 6-10 41-15 16-25 26-45 
Number of 
seriterices 18 32 22 14 14 


(a) State a suitable type of diagram which could 
represent the data in Table 1. 

(b) The survey was carried out on a Monday 
morning. Give one possible reason why 
conclusions based upon the results ©! 

Table 1 should be treated with caution. 

(c) Represent the data in Table 2 by means oO 
an accurately drawn histogram on graP 
paper. 

{d) Use the figures in Table 2 to calculate, 
correct to three significant figures, estimates 
of the mean and standard deviation of the 
mumber of words per sentence 


e} D 
(e} fig! ire a 8 of words per sentence 
eo! sentences tak 
ake / ken from a 
seo ae cape presented in a table 
s ‘able 2 and with the 
intervals. From thi eee 
3 s new tabl i 
aver eet e, an estimate 
an number of words 
per senten 
ee leaseaprt to be 9.145. In fact, the sly 
pent se this ee with more than 25 : 
s one which was 32 
¥ i words long. 
eeger an improved estimate for the ‘ 
ean of this second sample. {C) 


5. Th i 
the nie Ge shows data about the time 
, in seconds to the nearest 
; , second, fo! 
; rr 
spoon each one of a series of 75 simil 
chemical experiments. . ite 


3 Number of 
Time (s) experiments 
50-60 4 
61-65 13 
66-70 26 
71-75 2 
76-86 10 


6. As part of a detailed i 
study of 
employees and recorded ie fines 


(a) State the t i 
: ype of dia: i 
is De the eo ARIST 
alculations using th: i i 
selusne ane : e data in the table give 


mean time of the experi 
periments 
standard deviation Bee : 
.37 8 


Explain why 
s ry these are estima 
precise values. tes rather than 


(c) oe te median and the interquartile 
e times t: i 
y experiment s taken for completing the 
it was subsequent! 
; | ly revealed that the four 
pene in the 50-60 class had aeauall 
aken, 57, 59, 59 and 60 seconds 
sperre: State, without further 
bs culation, what effect (if any) there would 
be on the estimates of the median. 
Pancha range and mean if this 
information were taken into account.  (C) 


‘orce, a large company selected a random sample 0: male 
e, a larg. any sel Ja random ple of 100 i 


: reco ti 
illustrates the distribution produced. ime each employee had been with the company. The histog 
‘ ram 


Frequency density 


‘Time tenployed by the company for a random s: 


ample of 100 of its male employees 
{ler CL Lt oegee 


{a 
) Copy and complete the following table. 


jp the sample 


Time ( 

years} 0 2. 

--S-. 10- 15- 20 
-30 


Number of males 


in the sample 35 


(b) Calcul: i 
‘late estimates of i 
is carte times for the ee ae a 
n equivalent rando ki 
‘ m samp! 
peaeiny gave calculated eee 
ts for the median, and 1.8 years and 


Time (years) 


8.6 years for the quartile time 
y s. T! 
rites ia - a. sample eee 
t any for 20 years. The sample 
included a woman who had vi aL 
ine the company. Draw Uiiact oe 
P lots hs compare the distributions of the 
oe s of time male employees and female 
ia emp loyees had been with the compan 
) Lise three differences between the tw " 
distributions as illustrated by the ise slo 


(NEAB) 


116 A CONCISE COURSE IN A-LEVEL STATISTICS 


Mixed test 1B | 6. 
i 
1, Ina transport survey, the number of passengers {b) Calculate, to the nearest degree, the angle of g 500 
in each of 523 cars travelling into a town centre the corresponding sector in the 1990 pie EH 
ona particular morning was recorded. The chart. (C) e 
results are summarised in the following table. s 
4. A school cleaner is approaching pensionable age. 3 
Number of passengers She lives halfway between two post offices, A i § 400 
in acat QO 4 cee ek) and B, and has to decide from which of the two | 
she will arrange to collect her pension. For a few Hy 
Number of cars. 183 160. 108 63 gt months she has deliberately used the two post 
- offices alternately when she has required postal 300 
(a) Calculate the mean number of passengers in services. On cach of these visits she has recorded 
a cat, giving your answer correct to three the time taken between entering the post office 
significant figures. and being served. 
(b) State the mode of the number of people (i.e. The boxplots below show these waiting times fos 
passengers plus driver) in a car in the survey. the two post offices. “The symbol * represents an 200 
(c) It is given that, correct to three significant outlier. 
figures, the standard deviation of the 
number of passengers in a car in the survey Post Office A —| }+— \ 
is 1.09. State the standard deviation of the 
number of people in a car in the survey. (C} Post Office B —_ I+ see 100 
a a 
2, A student collected some data on the heights, a ae a tiie bl 
x cm, of plants of a particular specics. She chose 
to represent the data in a stem and leaf display, {a) Compare, in words, the distributions of the 
as shown below. waiting times in the two post offices. ° 
Unit is 1 em (b) Advise the cleaner which post office to use if 
141223444 $6779 the outliers were due to : 
a}11125 57 (i) acable laying company having severed ‘The diagram shows a cumulative frequency ‘Tons in Hole 
3/1225 59 the electricity supply to the post office, polygon for the numbers of competitors who {b) Copy and complete the following frequenc: 
4y4 BR # 8 (ii) the post office being short-staffed. se ir a marathon within 2, 2}, 3, 4 and 7 table. ane 
(a) (i) Explain why the data might be better (AEB) mee oe pie : TR 
% a ‘4 ram to estimat ein. hours Re EE 
Creag by tere: par IR and leaf 5. Thirty children were given a task to perform and (i) the median, ‘7 : 2 2h dp3 34 ARTZ 
Gi) Rewrite the above data in such a the times taken were recorded, each to the next (ii) the quartiles, No. of competitors 200 
display. whole number of minutes above the actual time. se times taken by the 500 competitors - ae 
(b) Calculate an estimate 0 F the sidan height, The results were as follows: o completed the run. (c) Calculate an estimate of 
in centimetres, of plants of this species. 12 20 14 17 17 8 19 13 27 13 (i) the mean, 
(c) Calculate the ‘median of the data given in the 16 18 10 7 22 16 11 18 1B 6 wi) the standard deviation, 
display. 16 12 14 23 15 8 10 17 16 19 of the 500 competitors’ times. (NEAB) 
(d) State which of the mean and median would (a) Copy and complete the following stem and 
be a better measure of location for the leaf diagram to illustrate the above data. 
heights of these 29 plants. Give a reason for 
your answer. (0) Key 10|5 represents a time of 15 minutes | 
3. A pie chast was drawn, for each of the years 00 6 7 8 8 
1990 and 1995, to illustrate the amounts spent 10 ) 0 1 2 2 
bya householder on electricity, £48, water and 10 5 6 6 S 
telephone, and to compare the total amounts 20 0 2 3 
20 7 


spent in the two years. 

(a) Given that the radii of the 1990 and 1995 . . . 
charts were 15 cm and 18 cm respectively, (b) Use your diagram to estimate the median 
calculate the percentage increase in the total and the quartiles of the distribution of times 

amount spent. taken to complete the task. 

The amount spent on water in 1995 was twice (¢} pete plot to illustrate the 

the amount spent in 1990. In the 1995 chart the on 

amount spent on water was represented by an 


angle of 47°. 


(NEAB) 


Regression and correlation 


In this chapter you will learn how to 


interpret scatter diagrams for bivariate data 


of the least squares regression lines and use them to estimate values 


» calculate the equations 


» calculate and interpret the value of the product-moment correlation coefficient 


’ ion coefficient 
» calculate and interpret the value of Spearman's rank correlat! 


SCATTER DIAGRAMS 


i i i ‘or example 
Suppose you wish to investigate the relationship between two variables x and y, f Pp 


i £ the spring (y); 
- i t the end of a spring (x) and the length ° 
- ie hee aie spent studying for an examination (x) and the mark achieved (y) 
—  astudent’s mark ina French test (x) and the mark in a German an us ede 
_ the diameter of the stem of a plant (x) and the average length of leaf of the p y). 


ivari i lotted 
Data connecting two variables are known as bivariate data. When pairs of values are p' ; 


a scatter diagram is produced. Here are some examples: 


Dependent and independent variables 


If one of the variables has been controlled, it is called the independent or explanatory variable. 


The other variable is then the dependent or response variable. 


SRRELATION 119 


For example, if you place weights of 10 g, 20 g, 25 g, 30 g, 35 g, 50 g, etc on the end of a 
spring and record the length of the spring for each weight, the weight is controlled so it is the 
independent variable. The length of the spring is the dependent variable. 


REGRESSION FUNCTION 


Having drawn a scatter diagram, you can then look for a mathematical relationship between 
the variables, y = f(x), where the function f, known as the regression function, is to be 
determined. 


LINEAR CORRELATION AND REGRESSION LINES 


Consider the simplest type of regression function, where y = f(x) is a straight line. 
If the points on the scatter diagram appear to lie near a straight line, called a regression line, 
you would say that there is linear correlation between x and y. Here are some examples: 


Positive linear correlation 


Negative linear correlation No correlation 


A regression line 


‘A regression line 


y tends to increase y tends to decrease 


as x increases as x increases 


No relationship 
between x and y 


Common sense and care are needed when interpreting scatter diagrams. 


@ Mathematically, there may appear to be a relationship, but this does not imply that there 
is a relationship in reality. You might find, for example, that over a period of time in a 
particular city there has been an increase in the number of robberies and an increase in 
the number of health food shops. It would however be foolish to imply that there is a 
relationship between these two variables. 

e The appearance of a mathematical relationship does not imply that there is a causal 
relationship. An increase in one variable does not necessarily cause an increase, or decrease, 
in the other variable. 


If it appears from the scatter diagram that a linear relationship is a sensible interpretation, you 
may then attempt to find a model for the relationship in the form 


. . A line of ‘best fit” 
of a regression line. y 


drawn ‘by eye’ | 


In previous work you may have drawn a line of best fit on the 
scatter diagram, attempting to draw it so that there are as many 
points above the line as below it, or as many points to the left of 
the line as to the right of it. The line should also go through the 
point (%, ), the means of the two sets of data. 


thod, Kno WI eye’, i ere is a mathematical way of 
his meth id, known as drawing ‘by yt ” is rather haphazard. There i one ; ’ 

i ing e reg ession line, known as the method of jeast squares at is is illustrated in 

fitt th th thod st d th strat din the 


following example. 


ical i ime, x minutes 
Consider the situation in which the mass, y g, of a chemical is related Aap ts ; 5 
for which the chemical reaction has been taking place, according to the é 


Time; x min 5 7 12. 16 20 


21 24 
Mass, yg 4 12 18 


These results can be illustrated on a scatter diagram. 


From the scatter diagram there seems to bea positive 
linear correlation between the mass and the time. 


The line of best fit must pass through the means of both 
sets of data, i.e. the point (%,y). You should find, by 
calculation, that this is the point (12, 15.8). It has been 
plotted on the diagram. 


Diagrams 1 and 2 show attempts at drawing the line of best fit. 


0 5 10 15 20 0 5 10 15 20 


i 1 Diagram 2 
iagram 


x,y oints above the 
In each of the attempts, the dotted line goes through @ ¥) and there are three p 
line and two points below it. Yet neither of these lines is correct. 


Diagram 3 shows the true line of best fit. 
Jt has equation 


25 
yod15 + 1.22% . 
This equation has been calculated by igs the method z 
ee = 1,15 + 1.22x 
of oo oo and the calculations are s: : . 1=138412 7 
page 123. 


0 5 10 15 20 
Diagram 3 


TION 121 


Note that the times, x, are chosen by the person holding the stopwatch, so x is the 
independent variable. The values of the mass, y, depend on the results of the chemical process 
at these times, therefore y is the dependent variable. If you were to repeat the experiment with 
the same values of x, you would almost certainly get a different set of values of y. So for a 


fixed value of x you could have several different values of y, all in the same vertical line on the 
scatter diagram. 


Least squares regression line of y on x 


To find the equation of the least squares regression line of y on x for the chemical experiment 
data, consider vertical distances 114, 11, 1113, 114, m1; drawn from each point to the regression 
line. These distances will be positive or negative according to whether the points are above or 
below the line, so instead work with the squares of these values and consider their sum, 
me+mse+mP+me+m?¢. A shorthand way of writing this is £77;7, where 


Umzpam~ptme+me+metms  fori=1,2,3,4,5 


A line that fits the data well is one that makes 27,7 as small as possible, i.e. it is drawn so 
that £77,” is minimised. 


This line is called the least squares regression line of y on x. 


Consider our three attempts at drawing the line of best fit. The vertical distances have been 
shown and you can see that £7? is least in diagram 3. 


a 
y y , 
a 
25 a Pea 25 i 
5 
m4 a 
20 m3 L-* 20 my ¢ M4 
5° « 
g} @) 
15 o? 15 ig 
img é 
104 7 10 ’ 
4 
54m 54 M1 
{" 
ie) a 9 0 CT T T = ql 
o 5 10 15 20 0 5 10 15 20 0 5 10 15 20 
Diagram 1 Diagram 2 . 


Diagram 3 


Useful formulae when calculating regression lines 


Before looking at how to find the equation of the regression line, here is a reminder of the 
formulae for the mean (page 28) and the variance (page 37) of a set of data together with a 
new formula that connects the x and y data, the covariance. 

For the x data: 


. xix 
The mean of the x data is x where x =——. 


The variance is usually written s”, but to distinguish that it is the variance of the x data, you 


could write s?, Usually, however, when working in the context of regression and correlation, 
the variance of the x data is written s,.. 


Remember that there are alternative formats of the varia 


1 Lx — ¥)* 
2 See 
n n 

For the y data: 
~~ 
Aa 
n 
1 = (y-9)? 
sync Ey-W=—— 


For the x and y data: 


Of Sy,=— Ex” 
lex ae 


nce: 


The covariance, S,., connects the x and y data and the formula is 


1 
Sy Ee BY-I)= 


In some textbooks and formulae booklets you might see the notation Sseo Syy 
e ‘small s’ formulae above as 


are known as the ‘big S formulae an 


L(x -x)(y—-9) 
n 


dare derived from th 


rx 


1 
or Sq 2 bay BaF 


n 


follows: 
Ex)? 
Sg = MS yy = DCE — BY? ot $x Ex? -na?=ta?—* ) 
(zy)? 
= = _ xy) Sady?- a2 = Dy? — 
Sy) = M5 yy=ZO-D) or Sy= hy ny HRY 
a ix Dy 
Spy = MS ay 2 EO -BV~D) of Syy = Lxy — NY = Bay — 


The big S formulae are useful in calculations 


where the factor of # cancels, but 


remembered that they are not the formulae for the variance and covariance. 


The equation of the regression line y on x 


You are probably familiar with the equation of a straight line in 


the form 
ye mx +c 


where mis the gradient and ¢ is the 


y-intercept. 


When writing the equation of the regression line, a slightly 


different format is usually used in which the constant term is 


written before the x-term and the letters used are 4 and b. 


The format is 
y=at bx 


where b is the gradient and a is the 


y-intercept. 


(0, €)} 


and S,,y. These 


it should be 


gradient m 


on ys +e 


: 


$s > ¥' o find the value of a and 6 for a particular set of data. 


For the regression line y on x written in the form 


y=a+bx 


the gradient, b, can be calculated as follows: 


Note that b is known as the regression coefficient of y on x 
To find a, use the fact that (%, 9) lies on the line. 
If y=a+bx then y=a+ bk 


Rearranging a = ~ bX 


To find the equation of the regression line y on x for the che experiment data o: 
q 24 y mical xper n 


x y x? y xy 
; 4 25 16. 20 
3 12 49 144 84 
18 144 324. 216 
in 21 256 441 05336 
24 400 576. 480 
Ex = 60 Ly =79 Ln? = 874 Ly* = 1501 Ixy = 1136 
There are five pairs of data, son =S. 
_ Xx 60 = 
X%=——=— =] 5 es 
oS 12 and y=— == 15.8 
2 ad 
ay UY Fe x 1136 - 12 x 15,8 = 37.6 


1 1 
= yy? % 
Sa F ix ee are 874 — 12? = 30.8 


For the regression line y on x in the form y = a + bx: 
_Sxy _ 37.6 
= -3087 1.2207... = 1.22 (2 d.p.) 
and a=y— be = 15.8 - 1.2207 x 12 = 1.150... =1.15 (2 dp.) 


So the equation of the regression line y on x is y = 1.15 + 1.22x 
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If you use the big S formulae: 
Lax Ly 60x79 


_60x7? _ 188 
Sgeloiea ees 
2 
(2x)? ee 
Sop = EK? a874a— 2 54 
pode = 288 1.2207... = 1.22 2 dp) 
Sys 4 


and a is calculated as above to give a= 4.15 (2 dp}. shes ast sae 
ion i e the 
i king out the equation 's to us 

‘An alternative way of wor 
equation of a straight line: . . eae 
; is th dient and the line goes through a fixed point (b, k), the equation 
Tf m is the gra 
be written 

eae jent i i int is (% y), so the 
; hi of the regression line y on *, the gradient is b and the fixed point is (x, 9), 

n the case 

equation of the regression line y on x can be written 


i the equation is 
For the above data relating to the chemic q 


y-15.8= 1.2207(x - 12) 
y-15.8= 4.2207x - 14.648 

y= 1.2207x + 1.152 

y= 1220+ 1.15 (2 dip.) as before 
Summarising: 


The least squares regression line y on x 1s 


ye a+ bx where a=) - bX and b 


Alternatively 
y= ble -®) . a — 
bis res -oefficient of y on x. 
f i +. known as the regression coemciet 9 

i caclic : line and is known as g 
b ig the gradient of the 
ee e regression line. It is 
Note th; y of the formats described above can be he to ee ao e . 

eee familiar with tne 

i that you are fa’ ; | nae 
owever, to make sure oe ay 

: ae pay are booklet, which may be one of these, or one 
examina 
different form. 


Making predictions using the regression line y on x 


The regression line y on x gives you the average value of y for a given value of x, so in certain 


circumstances it can be used to predict or estimate missing values. This is known as 
interpolating from the given information. 


The regression line y on x is used 


e when x is the independent variable and you want to estimate y for a given value of x, or 
you want to estimate x for a given value of y. 


e when neither variable is controlled and you want to estimate y for a given value of x. 
For the chemical reaction data, in which x is the independent variable, you can use the 
regression line y= 1.15 + 1.22x to estimate (a) y when x = 10, (b) x when y = 20, as follows: 
(a) The estimate of y when x = 10, written ¥, is given by 

§$=115 + 1.22 x 10= 13.35 


(b) The estimate for x when y = 20, written %, is given by 


20 = 1.15 + 1.22% 
1.22% = 18.85 


£=15.4 


Warning: you must take care, though, as estimating outside the range of your data is 
unreliable. For example, for the chemical reaction data, when the reactants have formed their 


product, the reaction ceases and the mass would not continue to increase. Going outside the 
range of data is known as extrapolating from the given information. 


Important note: In the situation where neither variable is controlled and you want to estimate 
x for a given value of y, you would use a different regression line, the least squares line x on y. 


You would also use the regression line x on y if y is the independent variable. This is described 
more fully on page 130. 


Using a calculator to find the regression line y on x 


Linear regression (LR) mode on the calculator enables you to input the pairs of data (x; y;) 
and then obtain the values of a and b and also %, 9, Lx, Ux”, Ly, Ly’, Lay and #. On the 
calculator, the value of a is usually denoted by A and the value of b by B. 


Your calculator may follow a similar procedure to that outlined below. If not, you should 
consult your calculator manual. 


Casio 85W/85WA/570W. | 


Set LR mode MODE] {3} 11 ! 
or [MODE] [MODE] [2} [4 | 
Clear memories SHIFT | { Sci | [=| 1 
Input data S|[,| [4] [DT ! 
7] GB) 2] (BT. / 
42] (5) (3] [DT \ 
T6l [, ] (24) [DT 
(20} (2) [24] [DT 
You now have access to Equation of regression line: ‘ 
Ax 11506. (SHIT) [7] El y=A+ Bx 
B= 1.2207 ss: [SHIFT] [8] [=] go y= 1.15 + 1.22% | 
You can check the following \ 
Bx? = 874 [RCL] [A] 
Ex=60 [RCL] (B] Red | 
nes [RCL| ral on th 
“yyt= 1501 [RCE] (D] 
By=79 [E] 
Ey = 1136 RCL 
Be12 (sarT (EI 
Sy, = 30.8 (SHH) (2) E) bel El 
y=15.8 
Sy = 50.56 [sirt] (5) iS 
To clear LR ‘mode [MODE] (1) 


To estimate y when x = 10, key in 
{0| [SHIFT] [¥] to give 13.35 ... 
To estimate « when y = 20, key in 
50| [SHIFT] [%] to give 15.44... 


>} 


Example 2.1 


One measure of personal fitness is the time taken for an individual’s pulse rate to return to 
normal after strenuous exercise; the greater the fitness, the shorter the time. Reg and Norman 
have the same normal pulse rates. Following a short programme of strenuous exercise they 
both recorded their pulse rates P at time ¢ minutes after they had stopped exercising. 
Norman’s results are given in the table below. 


t 0.5 1.0 1.5 2.0 3.0 4.0 5.0 


P 125 113 102 94 81 83 71 


REGRE 


(a) Draw a scatter diagram to represent this information, 
The equation of the regression line of P on ¢ for Norman’s data is 


P=122.3 - 11.08. 


(b) Use the above equation to estimate rma rat 
stimate Norman’s pulse e€ i i 
: Pp 2.5 minutes after stopping the 


Reg’s pulse rate 2.5 minutes after after stopping the exercise was 100. 
The full data for Reg are summarised by the following statistics: 
n=8, t= 19.5, Lt=63.75, EP= 829, LPr=1867 

(c) Find the equation of the regression line of P on ¢ for Reg’s data. 


(d) State, giving a reason, which of Reg or Norman you consider to be the fitter, 


Solution 2.1 


(a) Scatter diagram to show Norman’s data 


(b) P= 122.3 -11.0¢ 
so when t= 2.5, P = 122.3 -11.0 x 2.5 = 94.8 


(c) Regression line of P on ¢ for Reg’s data is P= a+ bt 


where a= P - bi and p= SS 
Su Su 
paZP _ 829 - ; 
Pe ag ey 625, £=— =——__ = 2.4375 
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To find 6 using the small s format: 


Sipp= 


1 gk 
—ZtP —iP =3 x 1867 — 103.625 x 2.4375 = 19.210... 
n 


1 1 
2p 3 x 63.75 ~2.4375* = 2.027 ... 


Sy=— ut 
n 
-19.210 
JS 9.47595. 
Sy 2027 


To find b using the big S format: 


xtZUP 19.5 x 829 
S,p=LtP -——_ = 1867 oes = —-153.6875 
n 


rn? 19.5? 
g,<202 2 263,757 = 16.21875 
n 


_ Sip _~ 153.6875 _ ~9,4759 ... 
S,  16.21875 


To calculate a, use 
a=P- bi= 103.625 - (-9.4759) x 2.4375 = 126.72... 


Regression line P on ¢ for Reg is P= 126.7 - 9.5t. 
ses more rapidly. This can be seen from the 


(d) Norman is fitter as his pulse rate decrea: 
dient for Norman is —11.0 and the gradient for 


gradients of the regression lines: the gra 
Reg is -9.5. 


cae 


Drawing a regression line on a scatter diagram 


Example 2.2 
The following data represent the lengths (x) and breadths (y) of 12 cuckoos’ eggs measured in 
millimetres. 
x: 22.3 23.6 24.2 22.6 22.3° 22.3 221 933° 922 222 218 23.2 
y 16.5 174 173 17.0 168 16.4 17.2 16.8 16.7 162 16.6 164 
Draw a scatter diagram for the data. 
Obtain the least squares regression line of y on x and plot this on the scatter diagram. 
: (NEAB) 


Solution 2.2 
The scatter diagram is shown below to 
To find the equation of the regression fine use the formulae or find it directly on the calculator 
where you should find that A = 11.473 122... and B= 0.232 717 9 ... Giving values to four 
significant figures, the equation of the least squares regression line of y on x is 


y=11.47+ 0.2327x 


gether with the regression line. 


oO plot the line on the scatter diagrarr ee to work out thr oints on the line. 
I ter di g) yo eed to wo Pp ii 7 
2 , > ut three i 
including (%, y), the mean of each set of data. 


From the calculator, ¥ = 22.675 and y = 16.75, so plot (%, ¥) as accurately as you can. 


No i 
‘ oe two other x-coordinates and calculate the y value for each, The x-coordinates 
should be within the range of data, perhaps at the extremities. 


Choosing x = 21.8 and x = 24.2: 

When x = 21.8, y= 11.47 + 0.2327 x 21.8 = 16.54 ..., so plot (21.8, 16.54). 
To obtain this directly on the calculator key in 

to give 16.546 ... 

When x = 24.2, y= 11.47 + 0.2327 x 24.2 = 17.10 ..., so plot (24.2, 17.1). 
Directly on the calculator: [24.2| [SHIFT] [¥] gives 17.1048 ... 


Now draw the regression line, joini i i 
eles nie e, joining the three points, but do not take the line beyond the 


Scatter diagram to show the lengths (x) and breadths (y) of 12 cuckoo eggs. 


RS NRE RT SN RRO 


Least squares regression line x on y 


In Example 2.2, the regression fine y on x would be used to estimate the breadth of a cuckoo’s 
egg, y, for a given value of the width, x. Note that neither the length nor the breadth of the 
cuckoo’s egg is controlled, so there is no independent variable. If you wanted to estimate the 
width, x, for a given value of the breadth, y, you would use a different line, the regression line 
xony. 

The least squares regression line x on y is used 


when neither variable is controlled and you want to estimate x for a given value of y. 
‘ 


@ when y is the controlled (independent) 
value of y, or y for a given value of x. 


e 
variable and you want to estimate x for a given 


Least squares regression 
line x ony 


This time the horizontal distances 1, 12. #3 +» from the 
points to the line are considered. 


The sum of their squares, 
Un2anfptngtny + 


js made as small as possible, i.e. the line is drawn so that 
Ln? is a minimum. 


The equation of the regression line x on y 


The equation of the regression line x on y is often written in the form 


x= c+ dy where c= X ~ d¥ 


and 


See page 122 for the formulae for 5x5 Sy» Say and S,,. 

Also, since the line goes through (%, 7), the equation can be written 
x= k= dy) 

dis known as the regression coefficient of x ony. 


Note, however, that d is not the gradient of the line. This can be seen by rearranging the 


equation x =¢ + dy 


y gradient = 3 


dy=x-¢ 


So the gradient of the regression line x on y is Fl and the 


Cc 
-intercept is ->- 
iy Pp d 


Considering the data of Example 2.2, the summary information is 
Ex = 272.1, Ex? = 6175.69, Ly = 201, Ly? = 3368.08, Exy=4559.04 and n=12. 


To find the equation of the ion li i 
He ane qi e regression line x on y in the form x = cy + d calculate ¢ and d as 


. xx 272.1 
Find X =—- = -_—-= sae Aue 

1D 22.675 Pasa, eee 
To calculate d using the small s format: 


i 1 
Sqy= > Exy — RY = 75 x 4559.03 — 22.675 x 16.75 = 0.11291 ... 


‘ 1,5 Be eB 1 2 
wre yr -¥ = 79 * 9368.08 ~ 16.75 = 0.11083 ... 
Syy 0.11291 
fate es 
Se 
To calculate d using the big S format: 
ix kt 
ee or eee ee se 
n 12 ‘ 
(zy) 201? 
S$ =Ey2-<24 = 3368.08 -—— = 
x 5 733 
Sig ded 
d=2= = 
3, 1.33 1.0187... 


To calculate c, use c= X ~ dy = 22.675 — 1.0187 x 16.75 = 5.6101 ... 


; The equation of regression line x on y is x = 5.61 + 1.02y. 
t is interesting to plot this on the scatter diagram, together with the regression line y on x 


You know that the line must go through (x, ¥ 
gh (X, ¥) so plot (22.675, 16.75). N 
y-coordinates, say y = 16.4 and y = 17.0 and calculate the value of x. poe eee 


When y = 16.4. x = 5.61 + 1.02 x 16.4 = 22.3 
When y = 17.0, x = 5.51 + 1.02 x 17.0 = 22.95 
Plot (22.3, 16.4) and (22.95, 17.0) and join the three points with a straight line. 
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t. You will see later that 


ame; in fact they are quite far apar 
(page 139). 


Notice that the lines are not the s 


this indicates that the correlation is not very strong 


or to find the regression line x on y 


imi i 126 for 
i ode, is similar to that described on page 146 ’ 
Se hone input the data with the y-coordinate first. For 
i he calculator, and the value of d by B. 


Using a calculat 


The procedure, using linear Be ‘ 
calculating the line y on x. This time, 10" 
the equation = ¢ + dy, the value of cis given by A ont 


The method is illustrated below, using the data for the cuckoos’ eggs given in Example 
2.2 on page 128. 


Casio 85W/85 WA/S70W 
Set LR mode MODE}|3} 11 

or [MODE] [MODE] [2} {1 | 
Clear. memories SHIFT] | Scl | [=] 
Input data 16.5] [, | [22.3] [DT 


17.1] [, | 23-6} [DT 
17.3} [, | [24.2] [DT 


16.4] [, | [23.2] [DT 


You now have access to Equation of regression line 


A=5.610 ... (c) (SHIFT) [7] [=] xe=ct+dy 

B = 1.0187... (d) [SHIFT] [8] [=] ie x= 9.614 1.02y 

You can check the following ; | 
Ly? = 3368.08 [RCL] [A] | 
zy 20 CB Lee 
n=5 RCL] [C 

Lx? = 6175.69 [RCL] [D] 

x= 272.1 [RCL] [E] Note that if your 

Dxy =4559.03 RCL] [F] calculator shows what is 

ee Ser} (1) E} dey ieuaieied 

Sy = 0.1108 ... [SHIFT] y for x and x for y when 

% = 22.675 SHIFT checking these. | 
5, = 0.4852... SHIFT] [5] [=] (x21 [= : | 
To clear LR mode MODE] {i 


Example 2.3 


A student found the following data for the female life expectancy, x years, and the Gross 
Domestic Production (GDP) per head, $y, in six countries in South Asia in 1988. 


Country es 5 
Afghanistan 42 143 
Bangladesh 50. 179 
Bhutan 47 497 
India 58 335 
Pakistan: 57 394 
Sri Lanka 73 423 


[n = 6, Ex = 327, Sy = 1661, Ex? = 18 415, Ly? = 529 909, Uxy = 96 412] 


160. 
i i i Nepal, where the value of y was 
(a) It is required to estimate the value of x for Nepal, ee ae 
(i) Find the equation of a suitable line of regression. Simpity ¥ : 
possible, giving the constants correct to three significant figures. 
(ii) Use your equation to obtain the required estimate. 


(b) Use your equation to estimate the value of x for North Korea, where the value of y 


was 858. © 
Comment on your answer. 


Solution 2.3 . ae 
(a) (i) Neither variable has been controlled in the given data and since be are ae vee 
estimate the life expectancy, x years, when the Gross Domestic Product pe , 
is $160, it is sensible to use the regression line of x ony. 


The least squares regression line of x on y has equation 


Sx Syy 
x=c+dy where c=x~dy and d= ae 


yy 


= erste __ Ey _ 1661 
irae a 
Using small s format to find d: 
327 1661 
xy — = 2d 
Sy A ERY BYR GX 96 412 — x 981.2 
1661) 
spt gEy Pag 529 909 - eS) =11681.47... 
gas 8t_ 0.08400... 
Sy 1168147 
Using big S format to find d: 
Ix Ly _ 327% 1661 = 5887.5 
Sx Day-—— = 96 M2 
2 
ey" 1661" _ 70 088.83 
2 A ee -———— = Py von 
Spa Lye =— 2 = 529 909 - —S 
d= Hon SORES 0.08400 ... 
Ss. 70 088.83 
yy 
Calculate c using 
c=k-dy 
1661 
= ed 0.084 00... x 
6 6 
= 31.24... 


Equation of regression line of x on y is x = 31.2 + 0.0840 y (3 s.f.). 
(ii) When y= 160, x =31.2+ 0.0840 x 160 = 45 (2 s.f.} 


The estimated value of the life expectancy in Nepal is 45 years. 


(b) From the equation, when y = 858 
x = 31.2 + 0.0840 x 858 = 103 (3 s.£.) 


This would give the life expectancy in North Korea as 103 years, which is clearly not 


sensible. The value of y = 858 is a long way outside the range of the data, and should not 
be used to estimate a value of x. 


Note on using the calculator in LR mode 


You should check whether the regulations of your examination board permit you to use the 
calculator in LR mode to find the equation of the regression lines without showing any 
supporting working. The equations are quick to find using the calculator, but a disadvantage 
is that if you make a slip when entering the data, your answer will be wrong, and this would 
result in the loss of all the marks. Supported by calculation, however, your answer, though 
wrong, would receive marks for method. 


Sometimes data are presented in such a way that it is not possible to find the equations of the 
regression lines directly using the LR mode. This is the case when, for example you do not 
know the raw data, but just the values of the summary statistics, Xx, Lx?, Dy, Ly?, Lexy and n. 


If data are presented just in this form, then the appropriate formula must be used and the 
values calculated. 


Consider also when data are given as in the following two examples: 


Example 2.4 


For a given set of data it is known that ¥ = 10 and y= 4. The gradient of the regression line y 
on x is 0.6. 


Find the equation of this regression line and estimate y when x = 12. 


Solution 2.4 


‘The equation of the regression line is y = 4 + bx, where b = 0.6. 


y=at0.6x 
The regression line goes through (%, 5), so P=at 0.6% 
4=4a+0.6x 10 
a=-2 


Equation of regression line is y= —2 + 0.6x 


When x = 12, y=-24+0.6x 12 =5.2 


Example 2.5 


Find the equation of the regression line of x on y if the line goes through (1, 4) and has 
gradient 2. 


Solution 2.5 


Equation of regression line x on y is x =¢ + dy 


i dy=x-¢ 
Re-arranging : ; 
ar na 
1 
Gradient = a 
ae 1 
d 
d=0.5 
So x=ctO0.Sy 
You are given that (1, 4) lies on the line 
d=ct+0.5 x4 
c=-l 
The equation of the regression line x on y is ¥= -14 0.5y. 


Exercise 2a Equations of least squares regress! 
+ for calculating the equations of the regression line: 
calculator in LR mode and to be competent at using the formula. 


Use the method you prefe 


. For each set of data, find nae 
: (a) the equation of the regression line of y on x, 
(b) the equation of the regression line of x on y. 
Plot them both on a scatter diagram and 
comment. 


Data set. 1 


yo 7 9 ML 14 14 15 21 22 23° 26 


y Edo 12 40 17 23 te 10 20 28 


Data set 2, 
x 2: 
oo 
5 82. 
5 85. 
5 89 
6 78 
75 66 
7.5 77 
7S 81 
10 70 
cae eG 
12.5 65 
14 69 
14.5 63 


on lines 


s. It is a good idea to be able to use the 


2. The following data show, in convenient units, : 
the yield (y) of a chemical reaction run at varlou: 


different temperatures (x): 


‘Temperature (x) Yield (y) 
110 24 
120 4.3 
130 3.4 
140 3.4 
150 2.9 
160 5.5 
170 3.3 


{a) Plot the data. Comment on whether it 
appears that the usual simple linear 
regression model is appropriate. Pe 

(b) Assuming that such a model is appropriate, 
estimate the regression line of yield on 

erature. 

{c) Parvo! estimated line on your graph, and 
indicate clearly on your graph the distances, 
the sum of whose squares is minimised by : 
the linear regression procedure. MEI) 


3. In acertain heathland region there is a large 
number of alder trees where the ground 5s 
marshy but very few where the ground is dry. 


The number (x} of alder trees and the ground 
moisture content (y} are found in each of 

ten equal areas (which have been chosen to cover 
the range of x in all such areas). The following is 
a summary of the results of the survey: 

Ix = 500, Ly=300, Lx?=27 818, 

Exy=16 837, Ly*= 10462 


Find the equation of the regression line of y on x. 


Estimate the ground moisture constant in an area 
equal to one of the chosen areas which contains 
60 alder trees. (O &C) 


4. To test the effect of a new drug twelve patients 
were examined before the drug was administered 


and given an initial score (I) depending on the 
severity of various symptoms. After taking the 
drug they were examined again and given a final 
score (F). A decrease in score represented an 
improvement. The scores for the twelve patients 
are given in the table below. 
Score 
Patient Initial (2) Final (F) 
1 61 49 
2 23 12 
3 8 3 
4 14 4 
s 42 28 
6 34 27 
7 32 20 
8 31 20 
9 41 34 
10 25 15 
it 20 16 
12 50. 40. 


Calculate the equation of the line of regression of 
Fonl, 


On the average what improvement would you 
expect for a patient whose initial score was 30? 
(MEI) 


5. Fora given set of data 
Ye = 15, Ex? = 55, Ly=43, Ey? = 397, 
ixy=145,n=5 
Find the equations of the regression lines y on x, 
and x on y. 


6. The following table shows the marks (x) 
obtained in a Christmas examination and the 
marks (y) obtained in the following summer 
examination by a group of nine students. 
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Student Christmas (x): Summer (4) 
A 57. 66 
B 35 St 
& 56 63 
D: $7. 34 
E 66 47 
F 79 70: 
G 8t 84 
H 84 84 
T 52 53 


It is given that Dx = 567, Ly = $52, Ixy = 36 261, 
Yx? = 37 777, Ey? = 36 112. 


(a) Find the equation of the estimated least 
squares regression line of Y on X. 

(b) A tenth student obtained a mark of 70 in the 
Christmas examination but was absent from 
the summer examination. Estimate the mark 
that this student would have obtained in the 
summer examination, (C) 


. For a period of three years a company monitors 


the number of units of output produced per 
quarter and the total cost of producing the units. 
The table below shows their results. 


Units of output Total cost 
(x) 1000’s.  ¢y) £1000. 
OE 
14. 35. 
29. 50 
55 : yee 
74 Q 930s 
11 3h 
23 42 
47. 65. 
69 86 
18 ks 
36 34 
61 81 
79 : 96 


(Use Sx? = 28 740; Exy = 38 286) 

(a) Draw a scatter diagram of these data. 

(b) Calculate the equation of the regression tine 
of y on x and draw this line on your scatter 
diagram 

The selling price of each unit of output is £1.60. 

(c) Use your graph to estimate the level of 
output at which the total income and total 
costs are equal. 

(d) Give a brief interpretation of this value. 

(AEB) 


. From a set of pairs of observations of the 


variables x and y, it is found that the regression 
line of y on x passes through the point (0, 1.8). If 
the means of the x and y values are 5.0 and 8.3 
respectively, find the equation of the regression 
line of y on x in the form y = a+ bx. (L) 


9% 


10. 


41. 


For a set of 20 pairs of observations 


variables x and y, it is known that Ex = 250, 


of the 


Zy = 140, and that the regression line of y and x 


passes through (15, 10). Find the equatio’ 


nm of the 


regression line of y on x and use it to estimate ¥ 


when x = 10. 


The gradient of the regression line x on y is -0.2 
and the line passes through (0, 3). If the equation 
of the line is x= c+ dy, find the value of c and d 


and sketch the line on a diagram. 


A small firm negotiates an annual pay rise with 


each of its twelve employees. In an attempt to 
simplify the process it is proposed that each 
employee should be given a scote (x) based on 


his/her level o: responsibility. The annual salary 
(y) will be fla + bx) and the annual negotiations 


will only involve the values of a and b. The 
following table gives last year’s salar 
were generally accepted as fair) and the propo! 


scores. 
Sncngei ee ee 
Aniiual salary. 
Employee x (6); 
Ah 10. 5.750 
B 55 17.300 
Cc 46 14.750 
D 27 8.200 
EB 17 6 350 
FE 12 6.150 
G 85 18 800 
H 64 14 850. 
I 36 9990 
J 0) 41000 
K 30 9150 
a 37 10400 


ies (which 


sed 


(You may assume that Bx =459, Ex? = 22 889, 


Ly = 132 600, Exy= 6 094 750) 


(a) Plot the data on a scatter diagram. 


(b) Estimate values that could have been use! 


for a and b last year by fitting the regression 
line y =a + bx to the data. Draw the line on 


the scatter diagram. 


(c) Comment on whether the suggested method 
is likely to prove reasonably satisfactory in 


practice. 


(d) Without recalculating the regression line find 


the appropriate values of a and b if every 
employee were to receive 4 rise of (i) £500 a 
year, (ii) 8%, (iii) 4% plus £300 per year. 


{e) Two employees, B and C, had to work away 
from home for a large part of the year. 


In the fight of this additional information, 


suggest an improvement to 


12. Ina regression calculation for five pairs of 
f values was lost when 


observations one pair 0 


the model, 
(AEB) 


the data were filed. For the regression of y on x 


the equation was calculated as 
y= 2-04 


The four recorded pairs of values are 


x 0.1 0.2 0.4. 0.3 


¥. 0.1 0:3 0.7 0.4 


Find the missing pair of values, using the 
following data for the four pairs above: 


Ex = 1, Ex? = 0.3, Exy = 0.47, Sy=1.5. 
(MEI) 


43, In an attempt to increase the yield (kg/h) of ap 
industrial process a technician varies the 
percentage of a certain additive used, while 

keeping all other conditions as constant as 


possible. The results are shown below. 
Yield, » % additive, x 
127.6 25 
130.2 3.0 
132.7 3.5 
133.6 4.0 
133.9 4.5 
133.8 5.0 
133.3 5.5 
131.9 6.0 


You may assume that Ex = 34, Ly = 1057, 

Lxy = 4504.55, Lx? = 155. 

(a) Draw a scatter diagram of the data. 

(b) Calculate the equation of the regression line 
of yield on percentage additive and draw it 
on the scatter diagram. 


The technician now varies the temperature (°C) 


while keeping other conditions as constant as 
possible and obtains the following results 
Yield, y ‘Temperature, ¢ 

127.6 70 

128.7 75 

130.4 80 

134.2 85 

133.6 90 

L 
He calculates (correctly) that the regression line 


e 
is y=107.1+ 0.298. 


(c) Draw a scatter diagram of these data 
together with the regression line. 

{d) The technician reports as follows, ‘The 
regression coefficient of yield on percentage 
additive is larger than that of yield on 


temperature, hence the most effective way of 


increasing the yield is to make the 
percentage additive as large as possible, 
within reason.’ 

Criticise the report and make your owD 
recommendations on how to achieve the 
maximum yield. 
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THE PRODUCT-MOMENT CORRELATION COEFFICIENT, r 


The product moment correlation coefficient, r, cal Vv: wi —1 ans 
-m0} 0: ion coeffi i 
» 7, 1S a numeric: 
3 al value between and 


1 


r=1indi eee 

- 1 i aaed perfect positive linear correlation. 
ae indicates perfect negative linear correlation. 

r= 0 indicates no correlation. , 


The nearer the ; 
value of r is to 1 or ~1 
‘ 7 the 7 
regression line, , the closer the points on the scatter diagram are to the 


Here are some examples of the value of r: 


Perfect negative Hi 5 
5 igh negative 
correlation : No correlation \ 
r=-l correlation per Some positive Perfect positive 
r=-0.8 correlation correlation 
r=0.5 
r= 


Plotting the two r ion li 

egression lines, y on x and 

Pe ae ; id x on y, on a scatter diagram can also gi 

nee e of r. The closer the two lines are together, the ne: : i pe is 
rated in the following diagrams: : Se er ane 


ae xand 
x ony coincide 


Perfect position correlation = Stro rl : mi ive correl 
pat Ing positive stron} iti 
a correlation r= OB. — oe 05. a 
= r= 0. 
¥ y. 


x ony coincides, 
with y on x. 


y¥on'x 


> 
a 
i: aa 


No correlation r= 
lation r= 0 Some negative correlation * ts 


Stron i iain 
p04 ig negative correlation Perfect negative correlation = 


r=-09 pal 
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i \ There are ten pairs of data, som = 10 

it is i h ts of scale of the variables. It is i é : . 

ris a very useful measure because it is independent of the units eee 1 

calculated as follows. | NEES gy oy TY 55 ge, 
Me a 10 


Using small s format: To find ¢ using small s format: 


Syxy 


to EY ag x 38 640 
pet. where sy=— Lay RP BS eae ~ 52.8 x 66,6 = 347.52 


Ex? _, 34464 ; 
See 3? = 52.8? = 658.56 


Sgt ~V~ 66.67 = 246.44 


Sey 347.52 Hie 
s, V658.56xV246.44 ” 


To find r using big S format: 


Ex 528 x 66 le 
S,y= Eay-—— = 38 640 § 3475.2 . 


Ss ~yy? ED” 
i n 


528? | 
=34 464 -" = 6585.6 | 


Sy)? 2 | . 
yp 2.46 920 -9o° = 2464.4 fi 
n 10 


| 
S 3475.2 
wrsates : = 0.8626 ... | 
Example 2.6 | 


“S.S, 6585.6 x ¥2464.4 
i ics. Fi xy : : 
The following table shows the marks of ten candidates in Physics and Mathematics. Find the 


; d ent on your value. The product moment correlation coefficient is 0.86 (2 s.f.), indicating good positive 
product-moment correlation coefficient and comm: cotrelation: 


Mark in Physics (x) 18 20 ee en —— ; 
Mark in Mathematics (y) 42 54 60 54 62 


Using the calculator in LR mode to find r 


The value of r can be found directly, for example: | 
Solution 2.6 | i cf 
y J x? ye Yo Casio 85W/85WA/570W. : 
18 42 3 324 ee oe Set LR mode MODE] 13] [1 
20 ‘4 a0 5000 1800 ot [MODE] [MODE] [2] (1] 1 
os a 1600 2916 ae Clear memories SHIFT] [ Sel ] [= i 
3844 | 
46 oe ae 4624 3672 Taput data 78] [2] 43} [DT ic 
54 68 2916 | 
25 a6 3600 6400 #008 - [30] 5) [54] [DT 
80 66 6400 4356 5280 - : 
98 30 7744 6400 NO 
92, 100 8464 10 000 92}, } {100} [DT | 
= 2 46 820 = 38 640 | 
Ex = 528 Ly = 666 dai = 34 404 » sd Output 
r= 0.826... SHIFT} | 7+ 
Clear LR mode MODE| | 1 


NOTE: The value of ¢ should be considered in conjunction with a diagram. 


dy in your calculator, you should find that the 


By calculation, or using the data alrea ou have forgotten how 


i 6 if 
regression line y on * has equation y = 38,7 + 0.527x. See page 126 if y 
to obtain this. 
‘Also, it can be shown that the regression line x on y has e 


this yourself on your calculator. 


The diagram shows the scatter diagram toge 


quation x = _41,1 +4 141y. Check 


ther with these two regression lines. 


n lines are close together. The scatter 


As expected, since 7 is close to 1, the two regressio’ 
diagram confirms good positive correlation. 


Mark in Mathematics 


Mark in Physics 


i ici ndr 
i i regression coefficients a 
Relationships between reg : 


s xy 
x whereb=—% orb=—~- 


; og b 
‘The regression line y on x has equation y= 4 + = Sixx 
Sxy d= Sey 
arey ‘ =ct+dy whered=— ord=~- 
The regression line x on y has equation x y Sy Sy 
S.y_ 5 
Y yg Day 
s Sxy or bxd=— x= 
Now beds Sex Sy 
= yy 2 as 
xy = 
=x SS» S,Sy 
SxSx SySy 5 2 
Say \ (225 
=| S,Sy 
SSy _p 


Since 7? > 0, this implies that b and d are both positive or b and d are both negative. 


If b and d are positive, then r will be positive and r= +Vbd. 
if 6 and d are negative, then r will be negative and r= —Vbd. 


In Example 2.6, 


b =0.527, d=141 sor? =bxd=0.743 ... 
r=Vb x d=0.86 (2 d.p.) 


Example 2.7 


Show that if r = +1, the regression lines of y on x and x on y are identical. (This was 
illustrated in the diagrams on page 139.) 


Solution 2.7 


The regression line y on x, y= 4+ bx, has gradient b. 


weer 1 
The regression line x on y, x = c+ dy, has gradient — 


7Z 
Now if r=tl 
then r=1 
Since 1?=bd, bd=1 
1 
b=— 
so q 


Therefore the two regression lines have the same gradient. 


But you know that they both go through a common point (X, 7), so the regression lines must 
be identical. 


Example 2.8 


If r = 0, show that the two regression lines are at right angles. 


Solution 2.8 
’ wg & Sy : 

Since r=—%, ifr=0, then s,, = 0. 
SyS 2. 
Sy 
‘¢ $ regression 

Now b=—*,sob=0; also d=—”, so d=0. yg MEA ONL 
Sxx Syy a 


The equation of the regression line y on x is y=a + bx, 


regression 
but 5 = 0, therefore the equation is y =a. 


line y on x 


The equation of the regression line x on y is x = c + dy, but 
d= 0 therefore the equation is x =c. % 


a i i ea a 
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Important note: 

The product-moment correl: 
important to consider the val 
example illustrates this point. 


of linear correlation only. It is 


ation coefficient 7 is a measure t 
scatter diagram. The following 


lue of r in conjunction with a 


Example 2.9 
For each set of bivariate data, find the product-moment correla’ 
diagram and then comment on your value of r. 


tion coefficient, draw a scatter 


Solution 2.9 


(a) Using a calculator for the first set 
correlation. But there could be so’ 


of data, you should find that r= 0, indicating no linear 
me other relationship between the variables. 


You may have noticed that the ‘ 
points all lie on the curve y=". 


There is a relationship between the 
variables — it is a quadratic one. 


~2 =i 0 1 2-8 


NOTE: r = 0 implies 


that either there is no correlation between the variables and they are 


independent, or the variables are related in a non-linear way. 


Iculator for the second set of data, you should find that r = 0.86 (2 d.p.), 


ing aca >) ] 
ete ¢ of positive correlation. 


apparently indicating a strong degre 


Scatter diagram to illustrate (b) 


al But you can see from the scatter diagram that there 


is not strong positive correlation. 


The value of r has been distorted by the point (9, 8), 
known as an outlier. 


Oo 25-4. 6 Be 


So a value of r close to 1, or ‘ 
correlation. Always check by referrin: 


~1, does not necessarily imply a strong degree of linear 


g to a scatter diagram. 


1. Calculate the value of the product-moment 


correlation coefficient for the following. Check 
using a calculator in LR mode if possible. 
Comment on your answers. 


(a) 
5 10 15 20. 2 


y 43 59 69 65 82 


To 3 4g G7 8 


12.4 12.8 12.6 13.9 134 13.2 14 14.6 


(d) 


. For a given set of data 


Ex =680 Ly=996 Ix? =20 154 
Ly? = 34670 Ixny=24844 n=30. 


Find the product-moment correlation coefficient. 


. The following data relate to the percentage 
unemployment and percentage change in wages 
over several years. 


% Unemployment % Change in wages 
(x) (y) 
1.6 5.0 
22: 3.2, 
2.3 2.7 
17 21 
1.6 41 
2.1 2.7 
2.6 2.9 
1:7 4.6. 
1.5 3.5 
1.6 44 


(a) Calculate the product-moment correlation 
coefficient between x and y. 


(Use Ex = 18.9, Ey = 35.2, Ex? = 37.01, 
Sy? = 132.22, Exy = 64.7) 


It has been suggested that low unemployment 
and a low rate of wage inflation cannot exist 
together. 


Exercise 2b Product-moment correlation coefficient 


{b) Without further calculation use your 
correlation coefficient to explain briefly 
whether or not you think the suggestion is 
justified. (L) 


. Twelve students were given a prognostic test at 


the beginning of a course and their scores X in 
the test were compared with their scores Y 
obtained in an examination at the end of the 
course. The results were as follows: 


Student x Y 
A 1 3 
Bo 2: 4 
Cc 2 $ 
D 4 S 
E S 4 
FE 5 8 
G 6 6 
H 7 6 
I Bas 6 
J 8 
Kk et 
L. Qo 10 
Determine the product-moment correlation 
coefficient. 

. Ten boys compete in throwing a cricket ball, and 
the table shows the height of each boy (« cm) 
measured to the nearest centimetre and the 
distance (y m) to which he can throw the ball. 

Boy. x y 

A 122 41 

B 124 38 

GC 133 $2 

D 138 56 

Eo 144 29 

FE: 156 54 

G 158 $9 

H 161 61 

I 164 63 

J 168 67 
Calculate the product-moment correlation 
coefficient. 
Calculate also the equations of the regression 


lines of y on x and x on y. (AEB) 


NOTE: check your value of ¢ by using the 
regression coefficients obtained in the equations 
of the regression lines. 
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SPEARMAN’S COEFFICIENT OF RA 


The heights /, in centimetres, and weight W, in 
kilograms, of ten people are measured. It is 
found thae Zh = 170, ZW = 760, Bh? = 293 162, 
EhW = 130 628 and EW? = 59 390. 

Calculate the correlation coefficient between the 
values of b and W. 

What is the equation of the regression line of W 
on h? (O & C} 


7. For a set of data, the equations of the least 


squares regression lines are 

y = 0.648% + 2.64 (yon x) and 

x= 0.917y- 1.91 (x on y) 
find the product-moment correlation coefficient 
for the data. 


8, For a given set of data the equations of the least 


squares regression lines are 

y = -0.219x + 20.8 (yon x) and 

x = -0.785y + 16.2 (x on y) 
Find the product-moment correlation coefficient 
for the data. 


9, For a given set of data, the regression line y on x 
is y= 0.4 + 1.3% and x on y is x= —0.1 + 0.7y. 
Find (a) the product-moment correlation 
coefficient, (b) ¥ and 9. 


You have used the product moment cor 


the correlation between the paired data (x1, Y1)s (ay Vado oes ( 


40, The body and heart masses of fourteen 


ten-month-old mice are tabulated below: 


Body mass Heart mass 
(x8) (y mg) 
27. 118 
30 136 
37 156 
38 150 
32 440 
36 455 
32. 157 
32 414 
38 144 
42 159 
36 149 
44 170 
33 131 
38 160 


[Pe 

(a) Draw a scatter diagram of these data. 

(b) Calculate the equation of the regression line 
of y on x and draw this line on the scatter 


diagram. 
(c) Calculate the product-moment coefficient of 


correlation. (AEB) 


NK CORRELATION, r, 


relation coefficient, 7, as a measure of the strength of 


Xpy Vq)+ Lhis is reasonable 


provided that both x and y can be measured. Sometimes it is not possible to measure certain 
variables, but it is possible to arrange them in order. 


For example, if two wine experts were asked to place six wines 


in order of preference, they 


would rank the six wines in order, using the numbers 1, 2, 3, 4, 5, 6- 


The wine they liked best would be ranked 1. 


The wine they liked least would be ranked 6. 
rength of the correlation between the two rankings by using 


It is possible to measure the st 


Spearman’s coefficient of rank correlation, f,- 


In general, this is obtained as follows: 


e Assign ranks 1, 2, 3, ...,#t0 the values of each variable. This can be done by putting the 


values in descending order 
the same rule for both sets of data. 


or in ascending order, but whichever you choose, you must use 


@ For each pair of values, ca culated d where d = rank x — rank y. 


e Calculate r, using the formula 
6ud’ 


nn 1) 


Consider this example: The five finalists in the County Dog Show were a Bulldog, a Pood! 
g, a Poodle, a 


Red Se Terrier and a Cocker Spar el. Two judge ked the dogs order ©: 

d Si tter, a ler: C judges rank gs 1n rd f 

preterence, e€ dog they liked best was ranked 1 and the results are shown i e table: 
f The dog th t able: 


Dog, 
Bulldog Poodle Setter ‘Terrier Spaniel 
Judge z ; 2. 3 4 5 
d=rank x— rank y 4. 2 - 4 ; ; 
To calculate Spearman’s rank : i : ici : —— 
correlation coefficient, use 
r 3 62d? ; 
. aD with 2= 5 and Zd*=14 
So ral LS aie eae 03 


~5x(25-1) 10 


But what does thi 
rie oe a of es you? In fact, Spearman’s rank correlation coefficient is 
e product-moment correlation coefficient, and is such that 


I<r.<l 
r,= 0.3 indicates a weak positi i 
positive correlation between th i 
.3 ind e two rankings. i 
way, it indicates a small degree of agreement between the two idea. me puaie 
r,= +1 means that the rankings are in perfect agreement 


r= i i 
, = 0 means that there is no correlation between the rankings 


r,=~1 means that the rankin i 
S$ are in com} i , 
eden 8 plete disagreement. In fact they are in exact reverse 


To illustrate this, consider three different sets of judges at the Dog Show: 
First pair of judges: ; 


Bulldog Poodle Setter Terrier. Spaniel 
(Perfect. A 1 2 
agreement) B dee 2 ; : . ; 
: 3 
i : 0: 0 0. 0 
0 0 0 0 Xd? =0 
peg oe 
A Mee 1-0=1 and the rankings are in perfect agreement. 
Second pair of judges: 
Bulldog Poodle Setter Terrier Spaniel 
(No c 1 2 3 
correlation) D 4 1 3 : 
5 2 
io He 1 0 -1 3 
- 1 0 1 2 Ld? = 20 
net 6rd 6x20 


ne) 5x24 = 1-—1=0 and there is no correlation between rankings. 
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Third pair of judges: 


Bulldog Poodle Setter Terrier Spaniel 
(Complete E 1 2 3 4 Ss 
disagreement) F 5 4 3 2: 1 
d =4 =2, 0 2 4 
d 16 4 0 4 16 xd? =40 
6nd? 


40 
r,=1-—— 1 - Gate | —2=~1 and the rankings are in exact reverse order. 
n(n* -1) 5% 24 


NOTE: the difference between the ranks, d, could be positive or negative. Since you are going 


to square this value to obtain d2, you could just write the numerical value for the difference in 


the table. This is written |d|, so in the table above, for Bulldog 


Rank E~ Rank F=1-5=~4s0|d|=4 and d?= 16. 


Example 2.10 
The marks of eight candidates in English and Mathematics are: 


Candidate 1 2 3 4 5 6 7 8 
English (x) 50 58 eh) 86 76 43 40 60 
Mathematics (y) 65 72 54 82 32 TA 40 53 


Rank the results and hence find Spearman’s rank correlation coefficient between the two sets 
of marks. Comment on the value obtained. 


Solution 2.10 
There are eight pairs of data, so m= 8. Ranking the lowest mark 1 and the highest rank 8 gives 
the ranks as shown in the table. 


English (x) 50. 58 35 86 76 43 40 60 
Maths (y) 65 72 54 82 32: 74 40 53 
Rank x 4 5 1 8 7 3 2 6 
Ranky = 5 6 4 8 1 7 2 3 
|} 1 1 3 0 6 4 0 3 
a 1 1 9 0 36 16 0 9 wd*=72 

rain one 

so n(n? 1) 

a ee 

8(64-1) 

=0.14 (2 d.p.) 


Spearman’s coefficient of rank correlation is 0.14 (2 d.p-). 
This appears to show a very weak positive correlation between the English and Mathematics 
rankings. 


It is interestin: v orTr, W vaiue Of r, the pro —j 0} 
ig to compare the value of ith the f h t 
oer s > pr duct-moment correlation 


Using your calculator in linear re: i i 
Sue as gression mode, or using the formula, you should find that 


The two values of the correlation coefficient are very similar in this example. 


Plotti i 
tting a scatter diagram of the marks does not appear to indicate much correlation. 


¥, 


80 x 
x x 
60 x 
x x 
40 x 
x 
20 


0 ——1— 
0 10 20 30 40 60 60 70 80 90 * 
Spearman’s coefficient i 
of rank correlation can be f 
e ° 
i ee und when data have already been ranked 


Example 2.11 
Two judges rank the eight photographs in a competition as follows: 
Photograph A B (ei D E F G H 
Ast judge 2 5 3 6 1 4 7 8 
2nd judge 4 3 2 6 1 8 5 7 


Cc ; ms : 
alculate Spearman’s coefficient or rank correlation for the data. 


Solution 2.11 


In this example, the data have already been ranked. 


Rank x 2 5 3 6 1 4 5 Fi 
Rank y 4 3 6 { Fi 5 : 
[a] 2 2: 1 0 0 4 3 7 
2. 
d 4 4 1 0 0 16 4 1 Xd? = 30 
a 62d? h 
‘s ne 1) where #= 8 
21-660) 
8(64-1) 
= 0.64 (2 d.p.) 


Spearman s coefficient o' i e da 
f rank correlation for the data is 0.64, indicati 
Ml é : @ 1s U.64, indicating some agreement 


Note on ranking: ; 
‘The masses, in kilograms, of five men, 


66, 68, 65, 69, 70, ranked in ascending order of 


magnitude gives 


x 66 68 65 69 70 
Rank x 2 3 1 4 5 
Tf there are two equal values, as in 66, 68, 65, 68, 70, rank as follows: 
x 66 68 65 68 70 
Rank x 2 3.5 1 3.5 5 


i 3.5 
Since the 3rd and 4th places represent the same mass of 68 kg, assign the average rank of 


to both places. 


i i iate. 
Note that if there are more than just a few equal values, this method is not approprt 


Care must be taken when interpreting the value of the rank correlation coefficient as 


illustrated in the following example. 


Example 2.12 


i ici ing data and interpret the value. 
Find Spearman’s rank correlation coefficient for the following 


Solution 2.12 


= 
Rank x 1 2 5 7 4 3 6 
6 
Ranke y 1 2 5 7 4 3 
td| 0 0 ) 0 0 0 0 


It is obvious that Z d* =0 


1 6xd* 
nn 1) 
=1-0 


=i 
Spearman’s coefficient of rank correlation is 1. 


This indicates perfect agreement between the rankings, but if yo 


i i itive correlation between t 
will see that although there is good posi ; e ° 
the points do not lie on a straight line. In fact, if you calculate the product-momen: 


correlation coefficient you will find that 7 = 0.944. 


uu draw a scatter diagram you 
he data it is not perfect since 


‘ 


Scatter diagram of data 


ranky 


CPN WEAD NK 
OrPnwan 


r= 0.944 


Note that a value of r, = 1 will be obtained for 
any set of values for which the values of y increase as the values of x increase. 


Similarly r, =—1 will be obtained for any set of values for which y decreases as x increases. 


Exercise 2c Spearman’s coefficient of rank correlation 


1, The table shows the marks awarded to six [pope he ag I etc | 
children in a competition. Calculate a coefficient Recording Critics Readers (hundred) 
of rank correlation for the data: siavuigiints ie GR Le sue ES 

A 9 15 

Chid Re B 4g #6 

Ce € 3 58 

Judge 1 6.87.3 BD IB TV 92 D. 32 49 

Judge. 22..7.85.° 94 7.9 VO 8.96.9 E 30 92 

F 25 37. 

2. At the end of a season a league of eight hockey G 7 10 
clubs produced the following table showing the A 8 90 
position of each club in the league and the rT 26 5S. 

AAC en sen moae ieee Neaeoeae 


average attendances {in hundreds) at home 
matches. Calculate Spearman’s rank correlation coefficient 
for the data. Explain what your result tells you 


Club. Position: Average Attendance about the opinions of these critics and readers, 
(C) 
A 1 27 
B 2 29 4. These are the marks obtained by eight pupils in 
Mathematics and Physics. Calculate Spearman’s 
Cc 3 9 =e A 
coefficient of rank correlation. 
D 4 16 
E 5 24 Mathematics Physics 
F 6 15 67 50 
G 4 12 42 59 
H 8 22. 85 var 
{a) Calculate the Spearman rank correlation 33 a 
coefficient between position in the league 
and average attendance. 97 62 
(b) Comment on your results. {L) 81 80: 
70 76 


3. A record magazine asked critics and readers to 
vote for the ‘Record of the Year’ from a short list 
of nine, The numbers of votes cast were as follows. 


Comment on your result. 


5. Ina skating competition one judge awards the 
same mark to all four competitors. Show that the 
coefficient of rank correlation (Spearman’s) is 
0.5, irrespective of the marks awarded to the 
competitors by the other judge. 


SE COURSE IN A-LE 


162 AC 


6. Ina study of population density in eight suburbs 
of a town the statistics shown in the table were 
obtained. The population density is denoted by 
p, and the distance of the suburb from the centre 


of the town by d. 

Suburb p (persons/hectare) d (km) 
A 5S 0.7 
B i 3.8 
C 68 1.7 
D 38 2.6 
E 46 Aeey 
E 43 2.6 
G 2A 3.4. 
H 25: 1.9 


a) Plot p against d on a scatter diagram. 

a Galette and mark on the diagram the 
mean of the array. ; 

{c) Calculate a coefficient of rank correlation 
between p and d, stating the system of 
ranking adopted for both quantities. 

(d) State what conclusions can be drawn from 
your answers to (a) and (c) concerning the 
general trend of the results. 

(c) Giving a reason for your answer, state 
which suburb in your opinion fits the 
general trend least well. (L Additional) 


7. Mr and Mrs Brown and their son John all drive 
the family car. Before ordering a new car they 
decide to list in order their preferences for five 
optional extras independently. The rank order of 
their choices is as shown: 


Optional extra. Mr Brown Mrs Brown | John 
Heated rear 

window ‘Ast 2nd 3rd 
Anti-rust 

treatment 2nd 4th 2nd 
Headrests 3rd Ast Ist 
Inertia-reel 
seat belts _ 4th Sth Sth 
Radio. Sth 3rd 4th 


(a) Calculate coefficients of rank correlation 
between each pair of members of the Brown 
family. 

(b) A salesman offered to supply three of these 
extras free with the new car. The family 
agreed to choose those three which wete 
ranked highest by the two members who 
agreed most. Which three did they choose, 
and in what‘otder? (L Additional) 


8. Seven army recruits (A, B, ses G) were given two 
separate aptitude tests. ‘Their orders of merit in 


each test were 


Order ofmerit ist 2nd 3rd 4th Sth 6th 7th 


‘Ist test G E A D BB CE 


2nd test Deeek po BG CLA 


_ A doctor asked ten of his patients, who were | 


Find Spearman’s coefficient of rank correlation { 
between the two orders and comment ears | 
the correlation obtained. (O ) i 


smokers, how many years they had smoked, In 
addition, for each patient, he gave a grade 
between 0 and 100 indicating the extent of their 
lung damage. The following table shows the 
results: 


Patient smoking grade 


Number of years Lung damage 


A 15 30 | 
B 22 50 | 
Cc 25 55 | 
D 28 30 

E 31 57 

F 33 35 

G 36 60 
H 39 2 

I 42 70 

J 48 75 


Calculate Spearman’s coefficient of rank 
cortelation between the number of years of 
smoking and the extent of lung damage. 


he fi which you obtain. 
Comment on the figure y (C Additional) | 


10, Sketch scatter diagrams for which 


(a) the product-moment correlation coefficient 
is -1, : Tea 2 

(b) Spearman’s correlation coefficient is +1, but 

the product moment correlation coefficient 

is less than 1. ; 

Five independent observations of the random 

variables X and Y were: 


xX 0 1 4 3 2 


¥ 41 8 3 4 Ee 


Find , 
(c)_ the sample product-moment correlation 
coefficient, - 

{d) Spearman’s correlation coefficient. 


{O &C) 


11. Sketch two scatter diagrams illustrating the 


12. 


following situations: 

{a) two variables having a large, negative 
correlation; 

{b} two variables having a small, positive 
correlation. 

The mean rainfall per day and the mean number 

of hours of sunshine per day observed at a 

weather station are given below. 


Rainfall Sunshine 
Month (mm) (hours) 
January 1.26 11 
February 4.25 27 
March 0:65 45 
April 2.40 51 
May 2.45 55 
June 2A7 7.6 
July 2.84 52 
August 1.74 57 
September 2.57 4.3 
October 1.65 2.9 
November 1.47 2.8 
December 1.94 1.8 


Calculate, correct to two decimal places, the 
rank correlation coefficient between rainfall and 
hours of sunshine. 

What is the rank correlation coefficient between 
rainfall and minutes of sunshine? 


(a) X and Y were judges at a beauty contest in 
which there were 10 competitors. Their 
rankings are shown below. 


Competitor >< y. 
A 4 6 
B 9 10 
€ 2: 5 
D S= 8 
E 3: 1 
F 10 9 
G 5= 7 
A 7. 4 
E 8 § 
J 1 3 


Calculate a coefficient of rank correlation 
between these two sets of ranks and comment 
briefly on your result. 

(b) Illustrate by means of two scatter diagrams 
rank correlation coefficients of 0 and -1 
between two variables X and Y. 

(C Additional) 


13. A company is to replace its fleet of cars. Eight 


possible models are considered and the transport 
manager is asked to rank them, from 1 to 8, in 
order of preference. A saleswoman is asked to 
use each type of car for a week and grade them 
according to their suitability for the job (A — very 
suitable to E — unsuitable). The price is also 
recorded. 


14. 


‘Transport 

manager’s Saleswoman’s Price 
Model ranking grade (£10s) 
RY 5 B oL1 
Tr 1 Be 8th 
U 7 D- SOL 
Vv 2 Cc 792. 
AV 8 Bt $20 
xX 6 D 573 
¥: 4 C+ 683 
Zz 3 A= 716 


{a} Calculate Spearman’s rank correlation 
coefficient between 
(i) price and transport manager’s rankings, 
(ii) price and saleswomenn’s grades. 

(b) Based on the results of (a) state, giving a 
reason, whether it would be necessary to use 
all three different methods of assessing the 
cars. 

(c) A new employee is is asked to collect further 
data and to do some calculations, He 
produces the following results. 

The correlation coefficient between 

(i) price and boot capacity is 1.2, 

(ii) maximum speed and fuel consumption 
in miles per gallon is —0.7, 

(iii) price and engine capacity is -0.9 

For each of his results say, giving a reason, 

whether you think it is reasonable. 

(d) Suggest two sets of circumstances where 
Spearman’s rank correlation coefficient 
would be preferred to the product moment 
correlation coefficient as a measure of 
association, (AEB) 


Candidate A.B Cee pee oR 


English 38°62 56. 420 $9. 48 


64. 84. 84. 60. 73.69 


History. 


The table shows the original marks of six 
candidates in two examinations. Calculate a 
coefficient of rank correlation and comment on 
the value of your results. 


The History papers are re-marked and one of the 
six candidates is awarded five additional marks. 
Given that the other marks, and the coefficient of 
rank correlation, are unchanged, state, with 
reasons, which candidate received the extra 
marks. (C Additional) 


Summary. 


@ Least squares regression lines 
Regression line of y on x 
yaat bx 
Ym? is'a minimum 


Regression line of x on y 
x=etrdy 
Tn? is a minimum 


@ Useful formulae for regression and correlation work 


Smalls format 


E25 

Sythe DFS aoe 
1 ea 
Syee ee oye? 
4 a Exy 
Syma BRR XY 


» Least squares regression line y on x Is 
y=atbx where a=y~ b% 


§ 
and be? = 


ey 
XX: Sux 
Alternatively 
: an Sn 
yy b(v-x)° where b= dat Sy 


9. Least squares regression line x on y is 
x=c+dy where cox dy 


Sy Sey 
and d=-%= . 
Sy Poy 


‘Alternatively 


SO Sy 
_gedty—y) where d=—=— 
x—x=d(y—¥) so Sy 


@ Linear correlation 


‘The product-moment correlation coefficient, 7, 


correlation -1<7r<1- 


Big S format 
(2x)? 


ig a measure of the strength of the linear 


‘6 e e s ° 
i = e e 8. e be 
6 
be eee z ° ie ° 2 - a : 
¢ * @. 
© ° . o . co e ae % 
° ns eo 6 eo. * 
° : ey 5 
i me positive. 
Perfect niegative High negative Ne ag oe Seren 
correlation correlation an r=0.5 
peel pH-0.8 


@. Formulae to calculate + 


7222 where s,=Vs,, and s)= Vs 
SxSy; 
5: 

ro— where S,=VS..-and’ S,=VS, 
5,5, 3 x wy 


@ Relationship between r and the regression coefficients 


: oo : SxS 
Regression coefficient of y on x is b where b=— ==» 
: Syp Shy 
: ee : Sey Say 
Regression coefficient of x ony is'd where d =—*=—* 
Syy Syy 


r can be found using r= bx d 


@ Spearman’s coefficient of rank correlation, r, 


6zd? 


r= TAGE Dy where 7 is the number of pairs of values and d = rank x ~ rank y. 


s 


-1<r,<1 where 7r,=1 means that the rankings are in perfect agreement. 


r,=—1.° means that the rankings are in exact reverse'order. 


Miscellaneous worked examples 


Example 2.13 


An old film is treated with a chemical in order to improve the contrast. Preliminary tests on 
nine samples drawn from a segment of the film produced the following results. 


Sample A B Cc D E F uG H I 
x 1.0 135222005 25 3.0 3.5 4.0 4.5 5.0 
y 49 60 66 62 72 64 89 90 96 


The quantity x is a measure of the amount of chemical applied, and y is the contrast index, 
which takes values between 0 (no contrast) and 100 (maximum contrast). 


(a) Plot a scatter diagram to illustrate the data. 
(b) It is subsequently discovered that one of the samples of film was damaged and produced 
an incorrect result. State which sample you think this was. 


In all subsequent calculations this incorrect sample is ignored. The remaining data can be 
summarised as follows: 

Lx = 23.5, Ly= 584, Lx?=83.75, Ly?=44622, Uxy=1883, n=8. 
(c) Calculate the product moment correlation coefficient. 


(d) State, with a reason, whether it is sensible to conclude from your answer to part (c) that 
x and y are linearly related. 


(ec) The line of regression of y on x has equation y=a + bx. Calculate the values of a and b, 


each correct to three significant figures. sales 
(f) Use your regression equation to estimate what the contrast index corresponding 


damaged piece of film would have been if the piece had been undamaged. eas 
(g) State, with a reason, whether it would be sensible to use your regression pty ti 
efinate the contrast index when the quantity of chemical applied to the film is zero. 


Solution 2.13 


(a) Scatter diagram 


(b) Sample F was damaged. 


(c) pa 235 9.9375 and He 


Ee 
n 8 n 8 


To calculate r: 
Using small s format 


: apes = 20,9375 
Sy7 Day =X 1883 2.9375 x 73 = 20.9 
get uxt gta dx 83,75 - 2.9375" = 1.839... 

aa 8 
ot py gent x 44 622-732 248.75 
Syy = n yoy 8 
soci ne Ee SR 
SySy 1,839... V248.75 
Using big S format 
Led 23.5 x 584 _ 
Sy Exy - = 1883 -— = 167.5 
2 23.52 
eee! = 83.75 - = 14.71 


2 5842 
§, a Zy?- 22” 244 622--—-= 1990 
i 8 


Sey 167.5 
“TSS, Vi4.71 11990 
So r= 0.98 (2 s.f.). 


(You should try this on your calculator, using LR mode.) 


0.9787 ... 


{d} Yes it is sensible to conclude that x and y are related. Since r is very close to 1, it would 
appear to indicate a very strong position linear correlation. 
(e) For the regression line y =a + bx, a= - b% 


y _ 20.9375 

ind Gate aay 
Se 1.839... 
s ‘ 

or bad 275 ay ag. 
Sy 14.38... 


a=y— bx =73 - 11.38 ... x 2.9375 = 39.57... 
y = 39.6 + 11.4x (3 s.f.) 


(f) When x = 3.5, y= 38.57 ... + 11.38... x 3.5 =79 (28.6) 
The contrast index would have been 79. 


(g) No it would not be sensible to use the regression equation when x = 0, since this is outside 
the range of data. Extrapolating outside the data is unreliable. 


Example 2.14 
The rules for a flower competition at a village fate are as follows. 


Three judges each give a score out of 100 to each entry. The two judges whose rankings 
are in closest agreement are identified, and their scores for each entry are added. The 
three prize-winners are those whose total scores from these two judges are the highest. 
The scores of the third judge are ignored. 


The judges awarded marks as shown in the table below. 


Contestant A B Cc D E F G. 
Judge X 89 83 80 72 69 54 41 
Judge Y 77 84 85. 65 79 72, 69 
Judge Z 73 83 89. 80 67 75: 69 


The value of Spearman’s rank correlation coefficient between X and Y is 0.5, and between X 
and Z is 0.46, correct to two decimal places. Calculate the value of Spearman’s rank 
correlation coefficient between judges Y and Z, and hence establish which were the three 
prize-winners. (C) 


pA TT 


158 AG 


Solution 2.14 


nN B c D E F G 
Rank Judge Y 4 6 7 i 5 
Rank Judge Z 3 6: 7 S 1 
id\ i 6 0 4 4 t 0 a 
# i 0 0 16 16 1 0 zd'=3 
' 6zd? 
"ne? 1) 
6x 34 
“7x A8 
= 0.39 (2 d.p.) 


Spearman’s rank correlation coefficients are: 
between X and Y: 0.5, between X and Z: 0.46, between Y and Z: 0.39. 
The two judges whose rankings are in closest agreement are X and Y. So Judge Z is ignored. 


Adding together the scores for X and Y, the final scores are: 


A B Cc D E F G 


166. 167, 165 137, 148 126 


The three prize winners are A, B and C. 


amenesaienyonnetnnenr a ran 


Example 2.15 
220 : 
& 
#15 4 
° 
10 
° 
° 
5 eo? 
T - 
fal 50 100 Length (cm) 


A mother monitored the growth of her baby and recorded the length cm and weight y kg at 
various stages in the baby’s development. The results were as follows. 


h 50 58 63 68 82 88 96 


y 4.43, 4.88 6.31 718 10.63. 13.60 17.95 


The mother thought that a model of the form 


y=p+ah 
where p and q are constants, might be suitable to describe the relationship between y and b. 


The diagram shows a scatter diagram of these data. 


(a) Comment on the suggested model. 


(b) Suggest, giving reasons, a better model to represent the relationship between y and h. 
3 
The new variable x = 


10 000 8 calculated and the values of x and y are given in the table 
below. 


x 12.5 19.5 25.0 31.4 SSA 68.1 88.5 


y 4.43 4.88 6.31 7.18 10.63: 13.60. 17.95: 


(c) On graph paper plot a scatter diagram of y against x and comment on whether a linear 


relationship between y and x is likely to provide a suitable model for the relationship 
between y and x. 


(d) Obtain the regression line of y on x. 
[You may use Ex? = 17 653.33 and Lxy = 3634,185] 


(e) Estimate the weight of the baby when it was 75 cm long. 


Solution 2.15 


(a) A model of the form y = p + qh suggests a linear relationship. 


The graph, however, appears to suggest that the data could be modelled by a curve, 
though a straight line might be possible. 


Since the data suggest a curve, then a curve such as y = kx? or y= ke* might be a better 
model. 


(c) » 


20 


(b) 


The scatter diagram of y against x suggests that a linear model would be a reasonable fit. 
(d) Ex = 300.1, By = 64.98 


Equation of regression line y on x is y= a+ bx 


160 ACONC 


NURSE IN A 


To find 6 using small s format 


- Lexy Bea 3634.185 % 300.1 . 64.98 = 421.1999 .. 
a maeed 7 7 7 
2 
ae Lees 300.1)" _ 683.944... 
Sx = n 7 Ls 7, 
. jg bag MOEN ore oe 
Oe 683.944 
To find b using big S format 
98 
Exky 300.1 x 64.98 _ g49 399 oH 
S,,=Exy- = 3634,185 - 2 
2 300.1)” 
ep E217 65333 _f a = A787.614 
xx 7 
‘ joa cS Spd 
: Sree 7487.614 ... 
64.98 300.1 _ 
a=y- bk = - 0.1772 x 7 = 1.685... 


i i ion fine is 
Giving values to three significant figures, the equation of the regression fin 


y = 1.69 + 0.177. 
p78? 
(e) When b=75,  *= T9999” 10 000 
When #= 42,1875, y= 1.68 +0.177 x 42.1875 = 9.16 (3 s£) 


When the baby is 75 cm long, an estimate of the weight is 9.16 kg. 


= 42.1875 


Miscellaneous exercise 2d 


4. A set of bivariate data can be summarised as a i 
follows: = ; oan Se 
=6, ix = 21, Ly =43, : os 
Set OL, yt 335, Exy= (7A. . : 
(a) Calculate the equation of the regression line re ‘i 
of y on x. Give your answer 1D the form 5 SG 


=a + bx, where the values of aand b 


aes 4 
should be stated correct to three significant - a 
figures. . 

(b) Ibis required to estimate the value of y for a 45 Ss 

given value of x. State circumstances under 40 SG 
which the regression line of x on y should 4 5 
be used, rather than the regression line of y 0 oo 
on x. {C) 


fo=11, w=275, Ew? = 9625, 


2. Ttis known that the wind causes a ‘chill factor’, y= -28, Te = 2306, Swt = —3045.] 


so that the human body feels the temperature to 
be lower than the actual temperature. The 
following table gives the perceived temperature 
(£°F) for different wind speeds (1 miles per 
hour) when the actual temperature is 25 TF. 


{a) Calculate the equation of a suitable 
regression line from which a value of ¢ can 
be estimated for a given value of w. Simplify 
your answer as far as possible, giving the 
constants correct to three significant figures. 

(b) Use your equation to estimate the perceived 
temperature when the wind speed is 
(i) 38 miles per hour, 

{iil} 55 miles per hour. 

(c) Calculate the value of the product moment 
correlation coefficient for the data, and state 
what this indicates about the data. 

{d) Comment on the reliability of the two 
estimates found in (b). {C) 


3. The following data were collected during a 
study, under experimental conditions, of the 
effect of temperature, x °C, on the pH, y, of 
skimmed milk. 


‘Temperature pH 
(x °C) (y) 


(a) Making reference to the following scatter 
diagram for these data, explain what it 
reveals about the relationship between x 
and y. 


30 100 
Temperature {x°C) 
(b) Determine the equation of the least squares 
regression line of y on x. 
[You may make use the following 
information. 


Er= 511, Yy=78.52, Ex? =28 949, 
Lxy = 3291.88] 


{c} Interpret your values for the gradient and 
intercept of the regression line found in (b). 


(d) Estimate the pH of skimmed milk at 20 °C 
and at 95 °C. In each case indicate, with a 
reason but without further calculation, how 
reliable you think these estimates might be. 

{e) Find the temperature at which you would 
expect skimmed milk to have a pH of 6.5. 

(NEAB) 


. The price £x of a certain cassette recorder is 


increased by £2 every six months. The number of 
recorders sold during the six months before the 
next increase is y thousand. The values covering 
eight consecutive periods are shown in the table. 


AQ. 42. AA AB AB S06 $2054 


12.80 11.6.0-19.3°10.3-° 10.2.5 9.0 8.99.2, 


[ix=376,  Ex?=17840, Ey = 83.9, 

Sy? = 893.33, Lxy= 3898.4] 

(a) Plot a scatter diagram for the data. 

(b) Obtain, in the form y =a + bx, the equation 
of the regression line of y on x, giving the 
values of a and b correct to three significant 
figures. Plot this line on your scatter 
diagram. 

(c) Calculate an estimate of the number of 
recorders sold when the price is £58, and 
comment on the reliability of your estimate. 

(d) Without further calculation, state whether 
the regression line of x on y will be the same 
as the line plotted in part (b). Give a reason 
for your answer. (C) 


. Explain, briefly, your understanding of the term 


‘correlation’. 

Describe how you used, or could have used, 
correlation in a project or in classwork. 
Twelve students sat two Biology tests, one 
theoretical and one practical. Their marks are 
shown in the table. 


Marks in theoretical Marks in practical 
test (T) test (P) 
5 6. 
9. 8 
Z 9 
iL 13 
20. 20 
4 9 
6 8 
17 17 
12 14 
10: 8: 
15 17 
16 18 


(a) Draw a scatter diagram to represent these 
data. 


(b) Find, to three decimal places, the 
product-moment correlation coefficient. 

(c) Using evidence from (a) and (b) explain why 
a straight line regression model is 
appropriate for these data. 

‘Another student was absent from the practical 

test but scored 14 marks in the theoretical test. 

(d} Find the equation of the appropriate 
regression line and use it to estimate a mark 
in the practical test for this student. (L) 


. (a) State the quantity which is minimised when 
using the method of least squares. Use a 
sketch to illustrate your answer. 

The heat output of wood is known to vary 
with the percentage moisture content. The 
table below shows, in suitable units, the 
data obtained from an experiment carried 
out to assess this variation. 


Moisture content (%%) © Heat output {y) 

$0 5.5 
8 TA 
34 6.2 
22. 6.8 
458 55 
15 7S. 
4 44 
82 3.9. 
60 4,9 
30 6.3 


(b) Obtain the equation of the regression line 
for heat output on percentage moisture 
content, giving the values of the coefficients 
to two decimal places. 

(c) Use your equation to estimate the heat 
output of wood with 40% moisture content. 
State any reservations you would have about 
making an estimate from the regression 
equation of the heat output for a 90% 
moisture content, 

(d) Explain briefly the main implication of your 
analysis for a person wishing to use wood as 
a form of heating. {L) 


. In the machine sewing section of a factory 
making high fashion clothes, a score is assigned 
to each finished item on the basis of its quality 


(the better the quality, the higher the score). Each 


seamstress’s pay is, In part, dependent upon the 
nutnber of items she finishes. The number of 
items finished by each of 12 scamstresses on a 
particular day and their mean quality score are 
shown. 


Number of items Mean quality 
Seamstress finished, x score; ¥. 

1 14 72 

2 13 7.3. 

3 17 6.9 

4 16 G3. 

5 17 2S 

6 18 7.6 

7 19 6.8 

8 32 37. 

9 18 6.5 
10 15 79 
i is 6.8 
12 19 7A 

yx=213,  Yy=82.6,  Tay= 1414.1, 
Yx2=4043, Ly? = 584.28. 


{a} Calculate the value of the product-moment 
correlation coefficient between x and y, and 
interpret your value. 

{b) Plot these data on a scatter diagram. 
Discuss, briefly, whether or not your 
interpretation in (a) should now be 
amended. 

(c) When the results were presented at a Board 
meeting, the Personnel Manager explained 
that Seamstress 8 had been experiencing 
severe financial difficulties at home. 
Explain, briefly, the implications of this 
additional information on your conclusions. 


(NEAB) 


. A purchasing manager of a London-based 


company believes that the time in transit of 
goods sent by road depends upon the distance 
between the supplier and the company. In an 
attempt to measure this dependence, twelve 
packages, sent from different parts of the 
country, have their transit times (y days) 
accurately recorded, together with the distance 
(x miles} of the supplier from the company. The 
results are summarised as follows: 


Sx=1800, Ly=36.0, Exy= 6438.6, 
Ex2= 336 296, Ly? = 126.34. 


Obtain the least squares straight line regression 
equation of y on x. : 
Explain the significance o the regression 
coefficient. 

Predict the transit time of a package sent from a 
supplier 200 miles away from the company. 

Give two reasons why you would not use the 
equation to predict transit time for a package 
sent from a supplier 1500 miles away. 

Calculate the product-moment correlation 
coefficient between x and y. 

Explain why the value you have obtained 
supports the purchasing manager’s attempt to 
establish a regression equation of y on x. (AEB) 


9. The government of a country considered making 


an investment to decrease the number of 


members of the population per doctor in order to 


try to reduce its infant mortality rate. (Infant 
mortality is measured as the number of infants 
per 1000 who die before reaching the age of 
five.) A study was made of several other similar 
countries and the variables x, population per 
doctor, and y, infant mortality, were examined. 
The data are summarised by the following 
statistics: 


® = 440.57, 9 = 8.00, 

Sy = 174 567.71. 

(a) Calculate the equation of the regression line 
of yon x. 


(b) Given that the country at present has 380 
people per doctor, estimate the infant 


Syy = 1598.00, 


mortality. 
(c) Comment on the coefficient of x in the light 
of the government’s plans. (L) 


10. Students on a French course were given an oral 


test, a listening test and a written test. The test 
results for the eight students on the course are 
given in the table. For the oral test, students were 
given a grade ona scale ranging from A, through 
A-, B+, B etc down to D-. For the listening test 
they were given a mark out of 25, and for the 
written test they were given a mark out of 100. 


Student Fen dee Bie 4 S65 7 8 
Oral test 
grade Ce C+ Be A=) C B D+ C 
Listening 


test mark (x) 10 21°22 19 17.14 13 16 


Written 
test mark (y)) 34.76 74 60. 68: 44° 45. 53 


Ex=132,  Ex?=2296, 
Ly? =27 402, xy = 7909. 
(a) Calculate the value of the most appropriate 
measure of correlation between the results in 
the oral and listening tests, justifying your 
choice of measure. Interpret the value you 
obtain. 
Calculate the value of the most appropriate 
measure of correlation between the results in 
the listening and written tests, justifying 
your choice of measure. Interpret the value 
you obtain. 
{c) The appropriate measure of correlation 
between the results in the oral and written 
tests has a value of 0.339. Comment on the 
indications given by the values of the three 
correlation coefficients about the 
performances of the students in the tests. 
(NEAB) 


ry = 454, 


(b 


a 


. The table below shows the names of five toy 
construction kits which were bought from a 
catalogue, the numbers of pieces, , found in 
each, and the corresponding prices paid, £p. 


12. 


NameSet 1°°> Set 30. Set'405 Set $0 Set 6 

nt il 21 28 37. 75 

p 1 26: 34 41 88 
[in=172,  Yp=200, En? = 8340, 
Yp?=11378,  Enp =9736.] 


(a) Plot a scatter diagram of the data, with 7 on 
the horizontal axis and p on the vertical axis. 

(b) Calculate the equation of the regression line 
of p on n, and plot this line on your scatter 
diagram. Use your equation to estimate the 
price of Set 2, which is not listed in the 
catalogue, but is thought to have 15 pieces. 
Give your answer correct to the nearest 
pound, 

(c) Calculate the product moment correlation 
coefficient for the given data, giving your 
answer correct to three decimal places, and 
interpret the result in terms of your scatter 
diagram. (C) 


The number of hours x (correct to the nearest 
half-hour) spent studying for an examination by 
12 students, together with the marks y achieved in 


the examination, are given in the following table. 
* y 
2 44 
3 50 
4 60 
4.5 54 
5 65 
6 73 
6.5 81 
8 89 
8.5: 84 
ee 90 
aS: 103 
[10a | 
[Ex=76,  Ex*=560, Ly =913, 
Yy?=75 153, Ixy =6425.] 


(a) Calculate the product moment correlation 
coefficient r for the data. 

(b) State what the value of r indicates about the 
relation between x and y. 

(c) The value of Spearman’s rank correlation 
coefficient for the above data is 0.986, 
correct to three decimal places. For the next 
examination the students each increased 
their study time by one hour and there was 
an increase of five marks in each of their 
examination scores. Without further 
calculation, state whether the new value of 
the rank correlation coefficient, correct to 
three decimal places, is less than, equal to or 
greater than 0.986, Give a reason for your 
answer. (C) 
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13. Over 


a period of ten years a survey was done on 


the number of cars owned per person ina 
particular county. The results are given in the 


table below. 
‘y= no. of cars 
‘Year x= year~ 1984 per person. 
4984 0 0.33. 
1985 1 0.35 
1986 2 0:36 
1987 3 0.37 
1988 4 0.38 
1989 5 0:39 
1990 6 0:39 
1994 7 0.40 
1992 8 041 
4993 9 0.41 
eis 
You are given that Zxy = 17.76, Ex = 45 and 
Ly =3.79% 
(a) Calculate the covariance of x and y, giving 
your answer correct to three decimal places. 
You ate also given that the variance of the 
x-values is 8.25, 
(b) Calculate the equation of the regression line 
of y on x. 
(c) State the value of y which the regression 
equation found in part (b) predicted for the 
year 2000. 
(d) Comment on the reliability of this 
prediction. {O} 
14, The table is a summary of the maximum 
temperature recorded in Plymouth during each of 
the seven months from June to December 1986 


inclusive. 

Month: x ‘Maximum temp °C 

Jun 1 22.3 

Jul 2 20.2 

Aug 3 17.9 

Sep 4 164 

Oct By 16.8 

Noy 6 12.6 

Dec 7 10.9 

(a) Plot a scatter diagram of the data using as 
x coordinates the coding shown in the table 


(b) 


{c) 


and the maximum temperature as the 

y coordinate. Mark the mean point of the 
data on your graph. 

Given that xy = 416.7, demonstrate that the 
gradient of the line of regression of y on x is 
—1.80 (to three significant figures). What is 
the physical meaning of this gradient? 
Calculate the full equation of regression of 
maximum temperature on month. 


(d) Use your equation to predict the maximum 
temperature in May 1987. The actual : 
maximum temperature was 45.3 °C. Why is | 
your predicted ‘value so different from i 
reality? (O) | 
145. The following table gives the daily output of the i 
substance creatinine from the body of each of ten | 
nutrition students together with the student’s 
body mass. 


[ Output of creatinine Body. mass 
(gtams) (kilograms) 

4.32, 5S 

4:54 48. 

LAS 55 

4.06 53 

2.13 74 

1.00 44 

0.90 49 

2.00 68 

2.70 78 

0:75 St 
Draw a scatter diagram for the data. 
Calculate, correct to two decimal places, the 
product-moment correlation coefficient. 
Comment on any relationship which is indicated 
by the scatter diagram and the correlation 


coefficient. (NEAB) 


16. The yield of a particular crop on a farm is 
thought to depend principally on the amount of 
rainfall in the growing season. The values of the 
yield y, in tons per acre, and the rainfall x, in 


centimetres, for seven successive years are given 
in the table below. 


ADB ABV IAS 14.2 °-13.2 14.4 12.0 


y 6.25 8,02 8.42. 5.27. 7.24 8-71 5.68 


[xy = 654.006, Ex=91, Ex? = 1191.72, 

By 249.56, Ly? = 362.1628] 

(a) Find the linear (product-moment) 
correlation coefficient between x and y. 

(b) Find the equation of the least squares 
regression line of y on x and also that of 
xony- 

{c) Given that the rainfall in the growing season 
of a subsequent year was 44.0 cm, estimate 
the yield in that year. 

(d) Given that the yield in a subsequent year was 
8.08 tons per acre, estimate the rainfall in 
the growing season of that year. (C) 


17. Following a leak of radioactivity from a nuclear 
power station an index of exposure to 
radioactivity was calculated for each of seven 
geographical areas close to the power station. 


In the subsequent five years the incidence of 
death due to cancer {measured in deaths per 
100 000 person-years) was recorded. The data 
were as follows: 


Area Index (x) Deaths (y) 
4 7.6 62 
2 23.2 7S 
3 3:2: St 
4 16.6 72. 
5 5.2. 39 
6 6.8 43 
7 5.0. 55 


[Ex = 67.6, Ex? = 980.08, Ey = 397. 

Ly? =23 649, Exy = 43398) 

(a) Find the estimated regression line of y on x. 
{b) In another geographical area close to the 


(c) 


(<) 


power station the index of exposure was 6.0. 


Use the estimated regression line to predict 
the incidence, in this area, of death due to 
cancer (in deaths per 100 000 person-years) 
Estimate the incidence of death due to ‘ 
cancer (in deaths per 100 000 person-years) 
there would have been if there had been no 
teak from the power station (i.e. if the index 
of exposure to radioactivity were zero). (C) 


18. Suggest a value for the product-moment 
correlation coefficient between x and y in each of 
the following cases. 


{a} ¥ 


19. For twelve consecutive months a factory 
manager recorded the number of items produced 


20. 


y the factory and the total cost of their 


production. The following table summarises the 
manager’s data. 


Number of items Production cost 
(x) thousands (y) £1000 

18 37 
36 54 
45. 63 
22. 42: 
69 84 
72 94 
13 33 
33 49 
59 79. 
79. 98 

10 32 

53 7A 


(a) Draw a scatter diagram for the data. 


(b 


) Give a reason to support the use of the 
regression line 


(y-§) = (xx) 


as a suitable model for the data. 


(c) Giving the values of %, and b to three 


(e) 


decimal places, obtain the regression 
equation for y on. x in the above form. 
(You may use Lx? = 27 963, Exy = 37 249.) 
Rewrite the equation in the form 

ysat bx 


giving a to three significant figures. 
Give a practical interpretation of the values 
of a and b. (L) 


An electric fire was switched on in a cold room 
and the temperature of the room was noted at 
five-minute intervals. 


Time, minutes, from 
switching on fire, x. Temperature, °C, y 
0 0.4 
5 LS 
10: 3.4 
15 SS 
20. TT 
25 9.7. 
30 17 
35 13.5 
40: 15:4 


You may assume that Xx = 180, Zy= 68.8, 


ixy 


(a) 
(b) 


(c) 


= 1960, Xx? = 5100. 


Plot the data on a scatter diagram. 
Calculate the regression line y= a + bx and 
draw it on your scatter diagram. 

Predict the temperature 60 minutes from 
switching on the fire. 

Why should this prediction be treated with 
caution? 
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Mixed test 2A Mixed test 2B 
ixed tes : 
X and Y, were asked to give marks i 1, The average trade-in value of a particular make Give a reason why this estimate differs from the 
i ee poets he see one Pe * ba ere ae brands of fish finger. The ! uf ai car cepa with time according to actual number of hours of sunshine on May Sth. 
i i to seven , i : ; : 
ee Licaniasy at ‘ als shows the yield of results are recorded in the table. pelos cin whieh the values of x may Explain the concept of least squares by reference 
ie ents per acre. oF \ to your scatter diagram and the regression line of 
y Brand Avi B cD | Age (x years) Value (£y thousand) y on x. (C) 
Amount of water (x) Yield of hay (y) Xp mark 8 10° 18.2 1 4-15 i 2.0 6.10 3. Acar manufacturer is testing the braking 
30. 485 Ys marke 14 12 9 4 Pid { 2.5 S555. distance for a new model of car. The table shows 
5.20 | 3.0 5.09 the braking distance, y metres, for different 
45 : Construct a table of ranks and calculate ! 3.5 4,65 speeds, x km/h, when the brakes were applied. 
60 3:78 Spearman’s rank correlation coefficient. (C) 45 3.89 
75. 6.60 at 50 3.51 Speed of car, 
90 7.35 4. Values a aH y ere bivariate data are 20 Sai x keni/h 30° 50°70 90. 110°” 430 
given in the follow: . 
105 ae ee 7.0 2.50 Braking distance, 
120 7. SR Ye ; [n=8, Yxr=335, Ly=346, Ex*=161.75, y metres 35 $0 85° 185 235. 350 
04 4.97 | Ly? = 160.2014, Lexy = 130.035.] (to the nearest 5 metres) 
(Use Cit ~ Buss Mie pega tine of y 02 1.94 \ (a) Calculate the product moment correlation 
(a) Find ¢! ayaa ants 03 1.39 poceeet ier and y, and state what Yx=480,  Ux?=45400,  Ly=900, 
b Teroret the Eoclacienis of your regression 0.4 1.82 ‘lL Materia eds about a scatter diagram Xy?=212100, — Exy= 94 S00. 
(b) in erp’ ae 193 i: I ed e data, bei (a) Plot a scatter diagram. : 
a - atuitven peeneeta yeas be for 2 463 (b) Itis AG Cal o ee the va. ot of ae (b) Calculate the equation of the regression line ! 
fry and re te ets0? Comment on the o : i - Pa arcuate t Aunt oe of a suitable of y on x and draw the line on your scatter 
i bility of each of your predicted yields. 0.7 1.49 ine Teaieane and use it to obtain the diagram. 
reliability L) 0.8 1.34 required estimate. ‘ (c} Use your regression equation to predict 
0.9 1.17 Ghee tee e of values of y when x = 100 and x = 150. 
. ‘ ee regression in the context of this situation, C ith he likel 
t, a bottle of milk was : : ‘omment, with reasons, on the likely 
# eee ot oom into a warm room. Its (n=9, Ex=4.5, By= 14.97, Ex?=2.85, epee pena Se accuracy of these predictions. 
ais tative y °C, was recorded at £ minutes Ly? = 25.5309, Exy= 6.885. . aig ble. une ip! equation:toobtain:a (d) Discuss briefly whether the regression line 
are it was brought in, for 11 different values (a) Calculate the product moment correlation rellable fen i 0 provides a good model or whether there is a 
f t. The results are summarised as: coefficient for this data and state what its (i) y hes am 7 oo. better way of modelling the relationship 
ae y value tells you about the relationship (ii) x when y = 3.00. (C) between y and x. (MEI 
we=44, E2=180.4, Lty= 824.5, between x and y. ; a ; 
Ly = 205. 2, ite following table gives x, the number of hours 4. In the two rounds of a show-jumping 
: . of sunshine, and y, the mid-day temperature in Bie i i ‘ 
Iculate the equation of the Jine of “4 =, > : 7 competition, seven riders recorded times, in 
Rene of hae tin the form y =a + bt. x xK »x Geeat Speteigeen on thie Abet eevett aye May: seconds, given in the following table. ; 
(b) Explain the practical significance of the xX x Houts of Mid-day ; 
value of a. mh oy Date sunshine, x. temperature, y°C Rider, Ae: Gud E EG 
(c) Use your equation to estimate the values of 2 z ip sy 
yatt=4.5 and t= 20.0. x May ist 10 17 Round:1° 127°) 134°.133.-139.) 140:..141. 146 
State, with a reason, which of these May 2nd 11 24 
(2) erinates is litely to be the more reliable. May aus 4 3 Round 2.132" 130 140.137 133) 138, 142 
The experimenter plotted a graph of y against ¢, May 4th 7 1B - 
but used only the data in the table below. May Sth 5 18 (a) Calculate Spearman’s rank correlation 
Ay. coefficient between the times for the two 
a _ May 6th 6 16 rounds. 
Xl x 
: poop 34 38 42 46 8 o May 7th 12 15 {b) It was subsequently discovered that rider G 
(minutes), : : The scatter diagram representing this data is Px = 53 2 2 had broken the rules of the competition and 
‘Temperature shown above. B a 1848 YS. 112, 92 Ex’ = 479, 10 seconds was added to his Round 2 time 
s 4718.3 18.6. 18.9 19.3. 19.4 (b) State the value of Spearman’s rank ae > xy = 882.] as a penalty. State, with a reason, what can 
CC), ¥ : : ‘ correlation coefficient for this data, and state Plot the data on a scatter diagram. be said about the value of Spearman’s rank 


what further information its value gives 
about the relationship between x and y. 
State which of the following best indicates 
the relationship between x and y. — 

(i) The product moment correlation 


correlation coefficient calculated from the 
revised data. 

(c) Later still it was discovered that, in Round 
2, riders A and B had to have their times 
interchanged, State, with a reason but 


{e) Plot this graph, and on it draw the line of 
regression. 

{f) State why the linear model could not be (c) 
valid for very large values of the time. 

ig} Using yout graph, comment on whether the 


Calculate the product moment correlation 
coefficient. 

The regression line of x on y has equation 

x = 0.607y — 2.14, and the regression line of y 
on x has equation y = 0.438x + 12.7, where the 


am ae ae rey A without further calculation, whether, as a 
: d state, givin: coefficient. . = coefficients are correct to three signifi : » 2 
— es reomgider tae apr (ii) Spearman’s rank correlation coefficient. figures, Using the se aatbd se fhe ibropeiiie result of this change, the value of 

a reason, whetner you Pe haan eee : ; Spearman’s rank correlation coefficient 
refined model could be found. {L} (iii) The scatter diagram. regression line, estimate the number of hours of Pea : 


Give a reason for your answer. (C) would increase, decrease or stay the same. 


sunshine expected on a day in May when the 


mid-day temperature is 18 °C. i 


Probability 


In this chapter you will learn 


e about different ways of estimating probabilities 
e how to use probability notation 


» about the probability laws including 
the rule for combined events 
the ‘or’ rule for mutually exclusive events, 
the ‘and’ rule for independent events 


@ about conditional probability 


e how to use tree diagrams 


lections, permutations and combinations and their application to 


e about arrangements, se 
probability 


that it will happen and it is given on 


The probability of an event is a measure of the likelihood 
probabilities can be written as 


a numerical scale from 0 to 1. The numbers representing 
percentages, fractions or decimals. 
A probability of 0 indicates that the event is impossible. 
A probability of 1 (i.e. 100%) indicates that the event is certain to happen. 
‘All other events have a probability between 0 and 1. 


For example 
There is an evens chance of a coin coming down heads when tossed; 


the probability is 5 or 0.5 or 50%. 


There is a 1 in 4 chance of cutting a pack of c 
the probability is 4 or 0.25 or 25%. 


ards at a diamond; 


_ The weather forecaster may say that there is a 70% chance of rain. 


e ticket can be shown to be approximately 


x 0.000 000 07. 


The likelihood of winning the lottery with on 


1 
1 in 14 million so the probability is {400 000 
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These probabilities can be shown on a probability scale: 


i t 
L L 
; 2 ; Certain 
+ 
7 50% 70% i oe 
Winning the Cutting a i | i | 
rn pack Ce i 
lottery jackpot at a diamond dowi heads ean 


ere are different ways of assigning numbers to robabilities of event ding o 
a gning h 
Th ys ie Pp: babilities nts, depen ig on the 


EXPERIMENTAL PROBABILITY 


lice es drop a drawing pin from a height it lands in 4 LE 
one of two positions: point-up or point-down. ‘point up int-d 
iH point-down 


Suppose you want to estim ili 

ate the probab: i in wi i i 
pet Pp ility that a drawing pin will land with point-up. 
(a) take ten identical drawing pi 

g pins and drop them from a height, s: 

Ma count an number out of the ten with points in the air. pee ae aes ine 
c) repeat the experiment so that it is carri : 

ied out a tot: i i i 

Se Gee ee al of 20 times, noting the cumulative 
(d) calculate the relative frequency of ‘points-up’ each time, where 
. 


. ‘ H 
relative frequency = number of ‘points-up’ 


total number of pins thrown 


Here is a table showing the results when this experiment was performed 


aie of ‘points-up’ in Cumulative number Cumulative number: Relative fi 
rawing pins of ‘points-up’. of pins thrown of pone up" dp } 
5 : 
: : 10 3-030 
: te 20 = 0.55 
: 30. 48 
: 21 40: 2 
6 40 3 : 
5 45 4 : 
3 48 30 : 
7 95: i : 
7 62 i S 
f 110 tf 
; 69. 120: ¢ 
7 74 130 
: 78. 140 
; 86 150. 
: 93 160: 
; 1074 170 
; 108 180: 
) 11S 190: 
122 200: 


The results can be illustrated on a graph. 


Le ee He 


° 
a 
1 
i 
i 
i 
| 
i 
fl 
i 
{ 


— 


Menany go rn 


0.45 


0.35 


Relative frequency of ‘points-up’ 


0.24 
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0 10 20 30 40 50 60 70 80 90 400 110 120 130 140 150 160 170 180 190 200 


‘otal number of pins thrown 
i 3 ‘ 
From the graph it appears that when the experiment 1S repeated a large number of times the 
elative y imiting v e whi is arouD: 6. 
relativ' frequenc approaches a limiting alu hich is d 0.6 
\ Vi i re} ‘Oo ili y, SO 
This imiting alue is taken as an estimate f the pr bability, 
“4 | f o> | yee u 28 ctly similar cor ditions and a 
1, if an expe! funent is erformed nm times under exactly Si £ 

To general, ik an expes § : 


yarticuian event OCce y times, then the ative frequenc s an estimate 0 the probabi y 
articnl { OCCHTS times, ther he relative fre yo an ¢ il 
a yen, 2g, then t 

E 


ability 
srimental probability. 
of this event. This is known as the experimental prot 


i i increases. 
Note that the accuracy of the estimate increases as 7 1m 


Writing P(A) for the probability of event A, 


Ir\ 
‘ sobability P(A) = lim|-—-| as #77 
experimental probability P(A) \a| 


: t 
where lim’ means the limiting value to which — settles as 7 increases indefinitely. 


Experimental probability practicals 


DOMINOES : 

Place a set of dominoes in a lee ae ee er daw ue 
imate the probabur 

frequency method to estima! } 

Sar oe ie bag at random two dominoes that have a 

number in common: on one of their halves. 


COUNTERS 


You will need a supply: of counters of two different colours. Ask someone to mix them up 
in a bag in a ratio known only to’ them. 


Use relative frequency methods to estimate the proportion of each colour in the bag. Then 
check with the actual values to see how close your estimate was. 


THREE COINS: 


Toss three coins a large number of times and use relative frequency methods to éstimate the 
probability that on any given throw two tails and one head will be obtained. 


PROBABILITY WHEN OUTCOMES ARE EQUALLY LIKELY 


When asked the probability of obtaining a head when a fair coin is tossed, you would 
probably give the answer 3 (or 0.5 or 50%) without bothering to toss a coin a large number 
of times and working out the limiting value of the relative frequency of heads occurring. 


Intuitively you would have used the definition of probability that applies when the possible 
outcomes are equally likely. 


Por equally likely outcomes, 
Pe ts3 number of successful outcomes 

probability = : : 

number of possible outcomes 


When tossing a coin there are two possible outcomes, a head or a tail and if the coin is fair 
these are equally likely to occur. Only one of the outcomes is successful (obtaining a head) 
so P(head) =}. 


SUBJECTIVE PROBABILITIES 


When you cannot estimate a probability using experimental methods or equally likely 
outcomes, you may need to employ a subjective method. 


For example, you may wish to estimate the probability that it will snow on Christmas Day, or 
the likelihood that a particular make of car will be stolen. In these cases you have to form a 
subjective probability which you might base on past experience, such as weather records or 
crime figures, on expert opinion or on other factors. This method is, of course, open to error; 


two people faced with the same evidence may give different estimates of the probability, It is 
sometimes, however, the only method available. 


PROBABILITY NOTATION AND PROBABILITY LAWS 


When deriving mathematical rules for probability it is useful to use the definition based on 
equally likely outcomes, but remember that the results hold for probability in general. 
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You need some preliminary definitions: 


i mes. 
Any statistical experiment or trial ae of ee 
i is called the possibur ‘ 
t of all possible outcomes 1s ca 
ogee A sae experiment is defined to be a subset of S. 


Here are some examples: 


e@ When a die is thrown, the outcomes are the numbers 
1 to 6. ee 

So S=(1, 2, 3,4 A : , 
oe to ibe the ‘event ‘the score is less than 3”. 
Then A = (4, 2). ; 
When two dice are thrown, there are 36 possible 
outcomes, shown by dots on the possibility space 
diagram. 

Define A to be the event ‘the sum of the aed scores 
is 6°. These outcomes are shown ringed in the 


Second die 


orn wee 


diagram. 
8 0 i123 4 5 6 


First die 
A 
Tn general terms a Venn diagram is often used to show co 
Aand S. 


The number of outcomes in the possibility saree is ve by n(S). 

The number of outcomes in event A is denoted by n(A). 

Writing P(A) for the probability of A, 

A) 

nS) 

Aisa subset of S, so 0< n{A) < nS). 

Dividing throughout by n(S) gives 
o<PlAyst 


Remember that - ; 
P(A) =0 means that event A is impossible, 
P(A) = 1 means that event Ais certain to happen. 


P(A) = 


The complementary event A’ 


‘A’ denotes the event A does not occur. 


n(A’) = n(S)~ (A) > 
n(S)- nA) _, A) 4 pra) 


Ss 


v —_ a 
so P(A) = ns) n(S) 
Therefore PLA) = 1 - P(A) 
or pays PA )=1 


instead of A’. 
Note that sometimes A is written for the complementary event inst 
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Example 3.1 
A group of 20 university students contains eight who are in their first year of study. A student 


is picked at random to represent the group at a meeting. Find the probability that the student 
is not in the first year of study. 


Solution 3.1 
Event A: student is in the first year of study. 
8 
P(A) = —- = 0.4 
(A) 20 


so P(A’)= 1-P(A)=1-0.4=0.6. 
The probability that the student is not in the first year of study is 0.6. 


Example 3.2 


Two fair coins are tossed. Show the possible outcomes on a possibility space diagram and find 
the probability that two heads are obtained. 


Solution 3.2 


Each coin is equally likely to to show a head or a tail. 


The possibility space for the outcomes is shown in the diagram, 
indicating that (S) = 4. 


Second coin 


Event A: Two heads are obtained. 


There is just one outcome for this so m(A) = 1. 
A 1 First coin 
Therefore P(A) = ~~ =5 


1 
The probability that two heads are obtained is zr 


Exercise 3a Elementary probability 


1. An ordinary die is thrown. Find the probability 3 
that the number obtained is 
{a) a multiple of 3, 


. The possibility space consists of the integers from 
1 to 20 inclusive. 


A is the event ‘the number is a multiple of 3’, 

a o than a B is the event ‘the number is a multiple of 4’. 
a factor of 6. A Bis 
An integer is picked at random, 
2. Ina box of highlighters there are eight which Find (a) P(A), (b) P(B’). 

have dried up and will not write. The box 
coutains 10 red, 15 blue, 5 green and 10 yellow 4. Dan carried out an experiment in which 16 coins 
highlighters. 


were tossed together. The number of tails 

A highlighter is picked at random from the box. obtained from tossing the coins was counted. 
Find the probability that 

{a) it is blue, 

{b) it is neither green nor yellow, P 
(c) itis not yellow, * Number of tails: 9, 7, 8, 6, 10, 7, 5, 5, 8, 9 
{d} itis purple, 
{e) it will write. 


This procedure was carried out ten times in all 
and the results were 


(a) Use Dan’s data to calculate the probability 
of obtaining a tail. 


The experiment was continued until the 16 coins 

were each tossed 100 times. 

(b) Calculate the total number of tails that Dan 
would expect to obtain. 


5, The probability of an event occurring is 0.27. 
What is the probability that it will not occur? 


6. Acard is drawn at random from an ordinary 
pack of 52 playing cards. 
(a) Find the probability that the card drawn is 
(i) the four of spades, 
(Gi) the four of spades or any diamond, 
(iii) not a picture card (Jack or Queen or 
King) of any suit. 
(b) The card drawn is the three of diamonds. It 
is placed on the table and a second card is 
drawn, What is the probability that the 
second card drawn is not a diamond? 


7. The pupils in a junior school class were asked 
how many brothers and sisters they had. Their 
answers are shown in the table. 


0 


Number of brothers 
and sisters 


Number of pupils 


Find the probability that a child chosen at 
random from the class comes from a family with 


three children. 


8. Acubical die, numbered 1 to 6, is weighted so 
that a six is twice as likely to occur as any other 
number. Find the probability of 


(a) asix occurring, 
(b) an odd number occurring. 


9, Acar manufacturer carried out a survey ip which 
people were ‘asked which factor from the 
following list influenced them most when buying 
a cat: 


A - the colour range available, 
B- the servicing costs, 

GC - driver air bag, 

D — fuel economy, 

E~ range of optional extras. 


‘The pie chart shows the results from 90 people. 


The names of those who took part were then 
placed in a prize draw. 

Find the probability that someone who said 
‘servicing costs’ will win the prize. 


10. The durations of 60 telephone calls are 
summarised in the table below. 


Qa: 92. 184. 27- 36-:45- 
3. 0 


Duration (minutes) 


Number of calls 6 10°21 20 


Use linear interpolation to estimate the 
probability that the duration of a call, 
selected at random from the 60 calls, exceeds 


30 minutes. 


14. The table summarises the results of all the 
driving tests taken at a Test Centre during the 


first week of September. 


Male Female 
Pass 32, 43 


15 


8 


A person is chosen at random from those who 
took their test that week. 
(a) Find the probability that the person 
(i) passed the driving test, 
(ii), was a female who failed her driving 
test. 
(b) A male is chosen. What is the probability 
that he did not pass the test? 


42. Wear tests on 100 components gave the following 
grouped frequency distribution of life Jength. 


Life length (x hours) Number of components 
500 <x < 530 is 
530<x< 550 24 
550 <x <570 33 
$70 <x < 600 24 
600 <x <650 7. 


Use linear interpolation to estimate the 
probability that a component drawn at random 


from the 100 has a life length between 540 and 
(C) 


580 hours. 


43. Two ordinary unbiased dice are thrown. 
Find the probability that 
(a) the sum on the two dice is 3, 
(b) the sum on the two dice exceeds 9, 
(c) the two dice show the same number, 
{d) the numbers on the two dice differ by more 
than 2. 
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14. Two fair cubical dic 
¢ are thrown si 
andes scores multiplied. P(x) penile ois a 
probability that the number # will be obtained. 


Calculate {i) P(9) (ii) P(4) (iii) P(14) 
(b) If P@ = z find i . 
5? fin the possible values of £. 


IL 
LUSTRATING TWO OR MORE EVENTS USING VENN DIAGRAMS 


Suppose A and B 

¢ are two events associ : 

deccribed belove ssociated with the same experiment. Consider the outc 
comes 

(a) AUB 


In se langua, e. € ce} utcomes that are in A or b or both is caile e 
Ze, set that contains th t that A B both dt 
t th e€ outco: i 


To represent A UB 
: F 
Eanes n the Venn diagram, shade the a 7 
e coloured ‘figure-of-eight’ shape 
: ’ B 
: nena that although this outcome is written 
. : ‘| it includes the events that are in both A and B 
ell. 
; 
‘ sas AUB means A or 8 or both. 
In set langua: 
ge, the set that contains th 
I a e outcor i 
intersection of A and B and is written AN B. Rg eo ea eee ee 
A s 


To represent A M B on the Venn diagram, shade the 


overlap of A and B. Thi : 
. iF . 
a aa 8. s outcome is often written 


a 


AmB means A and B. 


PROBABILITY RULE FOR COMBINED EVENTS 


s 


a number of outcomes in A is m(A) and the number of 
comes in B is n(B), then for two overlapping sets A a B 


if you add (A) and ; 
rice. (A) and n(B) together you will count the overlap 


AnB 
So to find the i 
‘ number of outcomes in A U B you have to take one overl, i i 
ye tea ath ep ae tlap away like this: 
Dividing by n(S), this becomes 
PLA) + P(B)- PLAC B) 


Alternatively 


PIA or B) = Pi 


~ P(A and B) 


Remember that the word 
or means A or B or both. 
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Other useful results relating two events A and B 


Example 3.3 ; . sccteam: A 3 
J lass of 20 children, 4 of the 9 boys and 3 of the 11 girls are in Be pane a ne {a) 
coe aon the class is chosen to be in the ‘egg and spoon’ race oD SPOF . / 


probability that the person chosen is 


A 
(a) in the athletics team, | ae : 
b) female, 
Hi a female member of the athletics team, | 
(d) a female or in the athletics team. 
AnB A . 
Solution 3.3 (c) sf P(B) = P(BN A) + P(BN A’) | 
are : f 20 people ; | 
Possibility space S: the class 0 . P «sch P(A) = a = 0.35 P(B and A) PUB but not A} 
(a) Event A: member of the athletics eae chosen, x0 
BNA | 


B 
B 
nB 
B 
(b) Event Fa female is chosen, P(A) = 20 =0.55 BNA 
B 
AnB 


: = (d) s P(neither A nor B) = 1—- P(A or B} 
in the athletics team) = P(A and F) 4 : apes 
te a uaniaer aoe ae the athletics team, so () ue PAB tT AU 8) 
3 
=——=0.15 
P(A and F) 30 


2 ~ P(A and F) 
(d) P(A or F) = P(A) + PUR) ~ Pl 
= 0.35 + 0.55 — 0.15 Example 3.5 


= 0.75 : : 
ij eee In a survey, 15% of the participants said that they had never bought lottery tickets or a 
premium bonds, 73% had bought lottery tickets and 49% had bought premium bonds. 


Find the probability that a person chosen at random from those taking part in the survey 


P(AN B)=P(BN A) 


y i 
PiAand BY P(B and A) 


S 


P(A) = P(A NB) + P(ANB’) 


P(A and B) P(A but not 8} 


Example 3.4 4 


: aes {a) had bought lottery tickets or premium bonds, 
9 P(D) = Zand (CUD) = 5. ) iss ‘ : 


(b} had bought lottery tickets and premium bonds, 
(c) had bought lottery tickets only. 


a 
Events C and D are such that P(C) = 30° 


Find P(C nD). 
Solution 3.5 
Solution 3.4 L: person has bought lottery tickets, P(L) = 0.73. 
Usin P(C U D)= P(C) + P(D) - Pp(C ND) B: person has bought premium bonds, P(B) = 0.49. sty 3 
sing P(neither L nor B) =0.15 
oi? men) ; | 
37 305 (a) P(L or B) = 1 — P(neither L nor B) | 
=1-0.15 BS 
19 2 4 : : i E i 
eg = 0.85 LorB Neither £ nor B i 
PCN D)= 394575 es : | 
7 (b) Use  P(L or B) = P(L) + P(B) — P(L and B) 
0.85 = 0.73 + 0.49 — P(L and B) 
P(L and B) = 0.73 + 0.49 — 0.85 f 
= 0.37 Land 8 
st 2 Is 
(c) P(L only) = P(L) - P(L and B) i 
= 0.73 -— 0.37 
= 0.36 


t 
Lonly 


J itaenee ey 
SOURSE IN BeLEV 


Showing all the percentages on a Venn diagram: 


§ 


eed | 
el | 
eceoens erate ACP AE i 


Example 3.6 


=0.1. 
Events A and B are such that P(A) = 0.3, P(B) = 0.4, (ANB) =9 
Find (a) P(ANB’), (b) P(A’ B) 


Solution 3.6 
fa) P(A)= PAB) + P(AN BY) 
0.3=0.1+P(ANB) 
P(ANB) = 0.2 


P(A'n BY) =1- P(A U B) 
“I BAU B) = P(A) + P(B)~ P(A B) 


=03+04- 0.1 
=0.6 

P(A' nn B) =1~ PAU B) 
=1-06 


= 0.4 wa ets 


erence 


Example 3.7 fed me 
i A, B or C they read. The re! 
ked which of three newspapers, “, : 
noer iv ae sc ieee B, 14 read C, 5 read both A and B, 4 read both B and C, 
sho’ : 


read both C and A and 2 read all 3. 


(a) Represent these data on a Venn diagram. 


i d 
Find the probability that a person selected at random from this group reaes 


(b) at least 1 of the newspapers, 
(c) only 1 of the newspapers, a 
(d) only A. 


Solution 3.7 


it i iven. 
(a) Draw 3 overlapping sets to represent A, B and C and fit in the numbers g' 


(b) P(reads at least one) = 1 ~ P(reads none) 
=1-8 = 2=0.84 


(c) P(reads only one) = P(reads only A) + P(reads only B) + P(reads only C) 
=W4 S48 = 2=0.62 
50 + 30 + 50 = 30 
(d) P(reads only A) = $$ = 0.32 


EXCLUSIVE (OR MUTUALLY EXCLUSIVE) EVENTS 


Consider events, A and B, of the same experiment. 
A and B are said to be exclusive (or mutually exclusive) if they cannot occur at the same time. 
For example, with one throw of a die you cannot score a three and a five at the same time, so 
the events ‘scoring a 3’ and ‘scoring a 5’ are exclusive events. 


8 
If A and B are exclusive, then P(A M B) = 0 since AN B is an 


impossible event. There is no overlap of A and B. 
For exclusive events, the rule for combined events becomes 


P(A U B) = P(A) + PUB 


This is known as the addition rule for exclusive events. 
It is also known as the ‘or’ rule for exclusive events: 
P(A or B) = P(A) + P(B) 


Extending this result to exclusive events, 


P(A, or A, or A, ... or A,) = P(A) + P(A,) + 


Example 3.8 


In a race in which there are no dead heats, the probability that John wins is 0.3, the 
probability that Paul wins is 0.2 and the probability that Mark wins is 0.4. 


Find the probability that 


(a) John or Mark wins, 
(b) John or Paul or Mark wins, 


{c) someone else wins. 


Solution 3.8 
Since only one person wins, the events are mutually exclusive. 


(a) P(John or Mark wins) = P(John wins) + P(Paul wins) 
=0.3+0.4=0.7 


(b) P(Jjohn or Paul or Mark wins) = P(John wins) + P(Paul wins) + P(Mark wins) 
=0.34+0.440.2=0.9 


{c) P(someone else wins) = 1- 0.9 = 0.1 


a 
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Example 3.9 


A card is drawn from a 
card is 


n ordinary pack of 52 playing cards. Find the probability that the 


(a) a club or a diamond, 
(b) a club or a King. 


Solution 3.9 5 . 
Possibility space S: the pack of 52 wil eae 1 : - 
C: a club is drawn, so P(C) = AS) are © 
nD) 13 #1 


“32 4 


D: a diamond is drawn, so P(D) = aS) 
mond, the events C and D are mutually 


a card cannot be both a club and a dia 


(a) Since 
exclusive. 
Therefore P(C or D) = P(C) + P(D) 
Tete. 1 
“4°42 
ni nk 


(b) Event K: a King is drawn, so P(K) =" sy ~ 5“ 437 


i i lub. 
The events C and K are not mutually exclusive since a card can be both a King and a clu 


Therefore ; 

P(C and K) = P(King of clubs) = a (>) 

K) = P(C) + P(K) — P(C and K) 
Benen i 4 1 16 4 ae 
"32 (505 82 SES 13" Ke KOKYKA 
al 
ern 
ee 


EXHAUSTIVE EVENTS 
t between them they make up the whole of the possibility 


TE two events A and B are such tha’ and PAU B)=1. 


space, then A and B are said to be exhaustive events 
ace, 


For example, if ; 
= (the integers from 14 to 10 inclusive), 
= (the integers below 7) = (1, 2, 3, 4, 5» 6), 
= (the integers above 5) = (6,7, 859 10) 
then AU B=(1,2, 3,4, 5,6, 7s 89 10)=S. 


> 
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Special case: 
Consider an event A and its complementary event A‘. 


Any event A and its complementary event A’ are both mutually exclusive and exhaustive. 
Extending this to 7 events: 


Hf A,, Ay, Ay, ..., A, are 2 events which between them make up the whole possibility space 
without overlapping, then 
P(A,) + P(A) + P(A,) + + PIA,) = 4 


and the 2 events are both mutually exclusive and exhaustive. 


Exercise 3b Probability - combined events 


1. An ordinary die is thrown. Find the probability 


that the number obtained is Faulty Not faulty 
(a) even, (b) prime, (c} even or prime. Machine A 3 1D 

2. Ina group of 30 students all study at least one of Machine B 2 8 
the subjects Physics and Biology. 20 attend the Machine C 5 15 


Physics class and 21 attend the Biology class. 
Find the probability that a student chosen at 


random studies both Physics and Biology. A component is chosen at random from those 


tested. 


3. From an ordinary pack of $2 playing cards the (a). Find the probability that the component 
seven of diamonds has been lost. A card is dealt chosen 
from the well-shuffled pack. Find the probability Nish : 
that it is (a) a diamond, (b) a Queen, (c) a (i) 48 from Machine 4, 


ii) is a fault t from Machine C, 
diamond or a Queen, (d) a diamond or a seven. {ii} 1s @ faulty Component tyomn Macune 


(iii) is not faulty or is from Machine A, 


4, For events A and B it is known that P(A) = 3, (b) It is known that the component chosen is 
P(A UB) = dand P(A 1B) = &. Find P(B). faulty, Find the probability that it is from 
Machine B. 


5. For events Cand D, 
P(C)=0.7, P(DUC}=0.9, 
Find (a) P(D), —_ (b) P(D' NC), 
{c) (DAC), (d) PID'NC'). 


7 7. It is known that P(X) = } and P(Y) = 4. Given 
Head) 20.3: that X and Y are mutually exclusive, find 
(a) P(XUY), (b) (YA X), — (c) PLY X"). 
8. For events A and B it is known that P(A) = P(B), 
P(A NB) =0.1 and P(A UB) = 0.7. 


Find P(A’). 


6. Tests are carried out on three machines A, B and 
C to assess the likelihood that each machine will 
produce a faulty component. The results are 
summarised in the table. 9 


. The probability that a boy in Class 2 is in the 
football team is 0.4 and the probability that he is 
in the chess team is 0.5. If the probability that a 
boy in the class is in both teams is 0.2, find the 
probability that a boy chosen at random is in the 
football or the chess team. 
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40. Two ordinary dice are thrown. Find the 
probability that the sum of the scores obtained 


(a) isa multiple of 5, 

(b) is greater than 9, 

(c) isa multiple of 5 or is greater than 9, 
(d)_ is a multiple of S and is greater than 9. 


41. Given that P(A’) = 3, P(B) = jand 
P(A 0B) = By find P(AU B). 


42. Two ordinary dice are thrown. Find the 
probability that 
(a) at least one six is thrown, 
(b) at least one three is thrown, 
(c) at Jeast one six or at Jeast one three is 
thrown. 


13, A and B are two events such that P(A) = fe 
P(B) = 3 and P(A 9 B) = 4, Are A and B 
exhaustive events? 


14, Give two examples of events which are both 
mutually exclusive and exhaustive. 


45. Two coins are tossed, A is the event ‘at least one 
head is obtained’. Describe an event B such that 
A and B are exhaustive events. 


CONDITIONAL PROBABILITY 


16. Ina large garden there are seven fruit trees and 
13 other types of tree. Six of the trees have birds 
nesting in them but only two of these are fruit 
trees. 

(a) Copy and complete the table below to 
illustrate this information. 


Fruit tree. Other tree Total 
Bird’s nest 2 6 
No nest 
Total 7 13 


The owner of the garden has given permission 
for Abdul to play in the garden but has 
instructed him not to climb any fruit trees or 
trees that have birds nesting in them. Abdul 
selects a tree at random to climb. | 


(b) Find the probability that Abdul will obey 
the owner’s instructions. 


Given that Abdul climbs a fruit tree, 


(c) find the probability that the tree has birds | 
nesting in it. (L) | 


If A and B are two events, not necessarily from the same experiment, then the conditional 
probability that A occurs, given that B has already occurred, is written P(A, given B) or 


P(A|B). 


In the Venn diagram, the possibility space is reduced to just B, since B has already occurred. 


It is also true that 


PIA 1B) = P(B| A) x PA} 


P(A 


B) x P(B) = P(B| A) x PIA} 


Example 3.10 


When a die was throw: wi UMLDE) oO. y Wi 
n the score was an odd numb W i ili i 
: r. What is the pr bability that it was a 


Solution 3.10 


P(odd) 


There are two aur 


P(prime, given odd) = 


tare prime and odd 


Sand 5, t 


I 


~ There vumbers, 1, 3 and § 


WI. GI awla>y 


P(prime, given odd) = 


It is possible to deduce this strai i ibi 
ghtaway, since the possibilit 
odd numbers 1, 3, 5 and two of these, 3 and 53 eee: Ry rea aaa 


Example 3.11 
In a certain college 


65% of the students are full-time student: 

S 
55% of the students are female, : 
35% of the students are male full-time students. 


; 7 1s ‘uaa 
» Biven © nB) Find the probability that 
nA ~ B) 5 : erie need at pee rae all the students in the college is a part-time student 
chosen at : ae > 
ANB “J _ , tes random from all the students in the college is female and a part-time 
pottom by #45) (c) a student chosen at rand . 
nS) ie at random from all the female students in the college is a part-time 
_ PAN B) (NEAB) 
P(B) Solution 3.11 
50 PLA, given B) = (A and B) Define events as follows: 
is and P(B) F; student is female, P(F) = 0.55 
she eB M: student is male, P(M)=1-0.55 =0.45 
ie. PIA[B)=— a ze Full: student is full-time, P(Full) = 0.65 
Rearranging: > (a) P(student is part-time) = 1 — 0.65 = 0.35 
P(A 0 B) = PIAL B) x P(B) ——- Remember PAG Bp = PBA} 


STRAT ayer PI TRUS 
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(b) Given that 35% are male, full-time students 
P(MNnFull) 0.35 
Also P(Full) = P(M in Full) + P(FN Full) 
0.65 = 0.35 + P(E NM Full) 
P(EN Full) = 0.30 
P(F) = P(E 0 Full) + P(F 9 Part) 
0.55 = 0.30 + P(E 1M Part) 
P(Female and part-time) = 0.25 


P(Part and F) 
P(F) 
0.25 
= 0.45 
0.55 
P(student chosen from female students is part-time) = 0.45 


(c) P(Part, given F) = 


Example 3.12 
X and Y are two events such that P(X| Y)=0.4, P(Y) = 0.25 and P(X) = 0.2. 


Find 


(a) P(Y|X) (c) P(X uU Y) 


(b) P(X NY) 


Solution 3.12 


(a) P(Y |X) x P(X) = P(X | Y) x PCY) 
P(Y|X) x 0.2 = 0.4 x 0.25 
P(Y|X)=0.5 
(b) P(X Y) = P(X| Y) x PLY) 

=0.4x 0.25 
=0.1 
P(X Y)=01 


(c) P(X U Y) = P(X) + P(Y)- P(X Y) 
=0.24+0.25-0.1 
= 0.35 
P(XUY)=0.35 


Example 3.13 
A group of girls at a school is entered for Advanced Level Mathematics modules. 
Each girl takes only module M1 or only module M2 or both M1 and M2. 
The probability that a girl is taking M2 given that she is taking M1 is 3. 
The probability that a girl is taking M1 given that she is taking M2 is }. 


Find the probability that 


(a) a girl selected at random is taking both M1 and M2, 


(b) a girl selected at random is taking only M1. {L) 
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Solution 3.13 


Events 
M;: a girl takes module M, 
M,: a girl takes module M,, 


You are given that P(M,|M,) = 4, P(M,|M,) = 4 
Since each girl takes one or both, P(M, U M,)=1 M 
y M, 
(a) Let P(M,n M,) =x ( ; 
My 0 Mo 
P(M,|M,)= PCM, 9 My) —~ PIM, 9 M,) = POM, OM.) 
P(M,) 
1__* 
° PCM) 
P(M,) = Sx 
Also P(M,|M,) = P(M, 0 M3) 
P(M,) 
1. % 
1a 
P(M)) 
P(M,) = 3x 
P(M, U M,) = P(M,) + P(M,) - P(M, 9 M,) 
But M, and M, are exhaustive events, so P(M, U M,)=1 
an 1=Sx+3x-x 
1=7x 
x= 
P(a girl is taking M, and M,) =} 
(b) P(M)) =5x= 5, P(M,) =3x=3 
P(taking only M,) = P(M,) - P(M, 9 M;) 
= ; -4 
an “7 ss) ‘g 
P(a girl is taking only M,) = () 
only My 


INDEPENDENT EVENTS 


If either o: e events A and B can occur wi eing aftected by the other, the the two 
f eith f thi ts A and without bi d by th it 5 


If A and B are independent, then P(A, given B h i i 

He Fire eae em , given B has occurred) is precisely the same as P(A), 
ie. P(A| B) = P(A). 

It is also true that P{B | A) = P(B) 
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Now, since P(A NB) = P(A | B) x P(B), for independent events this becomes Example 3.16 
PUA. B) = P(A) x P(B) Events A and B are independent and P(A) = 4, P(A NM B) = #. 
This is the multiplication rule for independent events. Find (a) P(B) (b) P(AUB). 
It is also known as the ‘and’ rule for independent events. Solution 3.16 
(A and B) = P(A) x PLB) {a) Since A and B are independent 
So there are three conditions for A and B to be independent and any one of them may be used P(AN B) = P(A) x P(B) 
as a test for independence _ ; x P(B) 
PLA CB) = P(A) x PB) : 
P(A| B) = P(A) | P(B) =— 
P(B| A) = PIB) . 4 
; P(A = = 
The multiplication law can be extended to any number of independent events i (b) P(AUB) | we P . P(ANB) 
P(A, and A, and ... A,,) = P(Ay) x P(Ag) Xo PIA,,) SS a as 
1 
Example 3.14 | 
A fair die is thrown twice. Find the probability that two fives are thrown. P(AUB) = 3 
Solution 3.14 | omen 
Example 3.17 
Onone throw, P(5) = Fi fit Si , 
e events A and B are such that P(A|B)=0.4, P(B|A)= 
= e—~ Independent events : )=0.25, P(ANB)=0.12. 
On two throws, P(S, and 5,) = P(54) x P(5>) ndependent event (a) Calculate Ras ea > ( )=0.12 
=§x% (b) Give a reason why A and B are not independent. 
= it (c) Calculate the value of P(A N B’). 
= 36 (L) 
P(two fives are thrown) = 3¢ Solution 3.17 
= i : ao = / ; 
i. “Plalpyt 
r P(B) 
Example 3.15 | 0.12 
Ina group of 60 students, 20 study History, 24 study French and 8 study both History 0.4 = P(B) 
and Erench. Are the events ‘a student studies History’ and ‘a student studies French’ 0.12 
i ? “ P(B)=——-=0.3 
independent? 0.4 
(b) P(B| A) = 0.25 
Solution 3.15 + P(B) 
From the information given: A and B are not independent. 
P(History) = 33 = } P(French) = 25 = 3  P(History and French) = & = #& (<) P(A) = a a P(A NB’) 
Now P(History) x P(French) = yeah Also P(B} A}= ae 
So P(History and French) = P(History) x P(French) 025 0.12 
The two events are independent. — se : Ce 
7 So 0.48 = 0.12 + P(AMB') 


P(AN B) = 0.36 
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Example 3.18 
The events A and B are such that 
P(A) = 0.45, P(B) = 0.35 and P(A U B) = 0.7. 
(a) Find the value of P(AN B). 


(b ; Explain why the events A and B are not independent. és 
(c) Find the value of P(A |B). 


Solution 3.18 
P(AU B)=P(A) + P(B) — P(AN B) 
Ns ; 0.7 =0.45 +0.35 - P(A NB) 
P(ANB)=0.8-0.7=04 
(b) P(A) x P(B) = 0.45 x 0.35 
= 0.1575 
+P(ANB) 
Aand B are not independent. 


P(A 2B) 
P(B) 
0.1 
~ 0.35 
= 0.286 (3 d.p.) 


(c) P(A|B)= 


It can be shown that if A and B are independent, then A’ and B’ are also independent. 


For independent events A and B 
P(A’ and B’) = P(A’) x P(B) 
and —-P(A'| B’) = P(A’) 
P(B'| A’) = P(B) 


Example 3.19 
The events A and B are independent and are such that P(A) = 
P(AN B)=0.15. 


(a) Find the value of x. 


x, P(B) =x + 0.2, and 


For this value of x, find 
(b) P(AU B), 


(L) 
P(A’|B’). 


Solution 3.19 
(a) Using the rule for independent events 
P(A NB) = P(A) x PB) 
2 O45 =x(x + 0.2) 
By guesswork, x = 0.3, since 0.3 x 0.5 = 0.15. 


PROE 


Alternative algebraic method: 
x? +0.2x=0.15 
(x +0.1)?- 0.01 = 0.15 
(x +01)? = 0.16 
x+0.1=+0.4 


Either x = 0.3 or x =—0.5 


(completing the square) 


(taking the square root) 


The negative value is impossible for a probability, 
sox=0.3 
P(A) = 0.3 and P(B) = 0.5 


(b) P(A U B) = P(A) + P(B) — (ANB) 
=0.34+0.5-0.15 
= 0.65 


(c) Since A and B are independent, so are A’ and B’. 
P(A‘ | B') = P(A’) 
= 1-P(A) 


Example 3.20 


The probability that a certain type of machine will break down in the first month of operation 
is 0.1. If a firm has two such machines which are installed at the same time, find the 
probability that, at the end of the first month, just one has broken down. 


Assume that the performances of the two machines are independent. 


Solution 3.20 


M,: machine 1 breaks down P(M,) = 0.1, P(M,') = 0.9 
M,: machine 2 breaks down = P(M,) = 0.1, P(M,') = 0.9 
If just one machine breaks down, then 
either machine 1 has broken down and machine 2 is still working (M, 9 M,') 
or machine 1 is still working and machine 2 has broken down (M,’ N'M,) 
Now M, and M,' are independent, as are M,’ and M, 
so as M M,')+ P(M,' 9 M,) = P(M,) x P(M,') + P(M,') x P(M,) 
=0.1«0.9+0.9x 01 
= 0,18 


The probability that after one month just one machine has broken down is 0.18. 


Example 3.21 


Three people in an office decide to enter a marathon race. The respective probabilities that 
they will complete the marathon are 0.9, 0.7 and 0.6. 


Assuming that their performances are independent, find the probability that 


(a) they all complete the marathon, 
(b) at least two complete the marathon. 
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Solution 3.21 (b) P(E, UE,) = P(E,) + P(E,) - P(E, 9 Ey) 


A: the first person completes the marathon, P(A) = 0.9, P(A’) = 0.1 | 5 2 aie ; 
B: the second person completes the marathon, P(B)=0.7, P(B') =0.3 5 (Ey) — P(E,) x P(E,) <-> Ey and E, are independent 
C: the third person completes the marathon P(C) = 0.6, P(C}= 0.4 5 9 2 
(a) P(all three complete) = P(A) x P(B) x P(C) «-— ladependent events 3°35 + P(E,)— ms P(E) 
= 0,9 x 0.7 x 0.6 9 3 
= 0.378 47 5 PPD 
(b) If at least two complete the marathon then either two of them do, or all three do. PUE,) = g° 5 
P(all three complete) = 0.378 from part (a) | 2) = 40 x 3 
P(two complete) = P(A) x P(B) x P(C') + P(A) x P(BY) x P(C) + _ P(A’) x P(B) x P(C) 3 
=0.9x0.7x0.4 + 0.9x0.3x04 + 0.1 x 0.7 x 0.6 “3 
= 0.456 — 
P(at least two complete) = 0.378 + 0.456 
= 0.834 { Example 3.23 
Looe ae Two ordi fair d 
A | na i 
It is important not to confuse the terms ‘mutually exclusive’ and ‘independent’. fay rica ry fair dice, one red and one blue, are to be rolled once. 
Mutually exclusive events are events that cannot happen together. They are usually the | a) Find the probabilities of the following events: 
outcomes of one experiment. Event A: the number showin awn 
Event B: the total of th tases sed die will be a 5 or a 6. 
Itaneously or can be seen to happen one 7 the numbers showing on the two dice will be 7 
vent C; the total of the numbers showing on the two dice will be 8, 


Independent events are events that can happen simu 
after the other. 
These three results are 


P(A and B) = P(A given B) x P(B) 
PLAN B)=PA | B) x P®B) | Solution 3.23 


(b) State, with a reason, which two of the events A, B and C are mutually exclusive 


particularly useful. Learn them. (c) Sh 
ow that the events A and B are ind 
e independent. (NEAB) 


For nvutaally exclusive events (a) 
P(A or B) = P(A) + P(B) The for? sale S 
P(A UB) = P(A) + POD) 2 
For independent events 8 There are 36 equally likely outcomes, so (5) = 36 
P(A and B) = P(A) x P(B) & 3 n(A) = 12 ee q2 1 
P(A OB) = P(A) = PB) “ P(A)= rae 
n(B)=6 . 4 6 1 
Example 3.22 (B) “ PB) = sore 
The three events E,, E, and E; are defined in the same sample space. The events E, and E; are n(C)=5 ~ WO= Ss 
mutually exclusive. The events E, and E, are independent. 36 
: 2 1 s& Score on red die 
Given that P(E,) = 3 P(E) = 3 and P(E, U E,) =» find ‘ . 
(b) It is not possibl ' 
a) P(E, UEs), possible to score 7 and 8 with : 
al Ae 5) w cvlelan: with one throw of the die, so events B and C do not 
Events B and C are mutually exclusive. 
Solution 3.22 (c) There are two ways to score 7 with the red die showing 5 or 6. These are (5, 2) and (6, 1 
(a) Since E, and E, are mutually exclusive, Son(A and B)= 2 and P(A and B)= 2 7 a , 2) and (6, 1). 
P(E, U Ey) = P(E) + P(E) i , 36 «18 
i But P(A) x P(B) = 2x t= 
3.6 18 


a 


hme 


So P(A and B) = P(A) x P(B) 


3 
Events A and B are independent. 


weak 
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Exercise 3c Combined events 


4. A number is picked at random from the digits 
1,2, ..., 9. Given that the number is a multiple of 
3, find the probability that the number is 
(a) even, (b)a multiple of 4. 


2, Ina large group of people it is known that 10% 
have a hot breakfast, 20% have a hot lunch and 
25% have a hot breakfast or a hot junch. Find 
the probability that a person chosen at random 
from this group 
(a) has a hot breakfast and a hot lunch, 

(b) has a hot lunch, given that the person 
chosen had a hot breakfast. (L) 


3. If events A and B are such that they are 
independent and P(A) = 0.3, P(B) = 0.5, find 
(a) (ANB), (b) P(AU B). 
Are events A and B mutually exclusive? 


4. If P(A|B) = 2, P(B) = 4, P(A) = 5 find 
(a) P(B{A), (b) P(A B). 


5, A die is thrown twice. Find the probability of 
obtaining a number less than three on both 
throws. 


6. Events A and B are such that P(A) = 4, 
P(A|B) = 3, P(B) = 4. 
Find (a) P(B| A), (b) P(A). 


7. Acard is picked at random from a pack of 20 
cards numbered 1, 2, 3, ...) 20. Given that the 
card shows an even number, find the probability 
that it is a multiple of 4. 


8. Ina group of 100 people, 40 own a cat, 25 own 
a dog and 15 own a cat and a dog. Find the 
probability that a person chosen at random 
(a) owns a dog or a cat, 

(b) owns a dog or a cat, but not both, 
(c) owns a dog, given that he owns a cat, 
(d) does not own a cat, given that he owns a dog. 


9. Acard is picked from a pack containing 52 
playing cards. It is then replaced and a second. 
card is picked. Find the probability that 
(a) both cards are the seven of diamonds, 

(b) the first card is a heart and the second a 
spade, 

(c) one card is from a black suit and the other is 
from a red suit, 

(d) at least one card is a Queen. 


10. A student investigating success in driving tests 
gathered information from 60 students in her 
school. Of these students, 2.5 were girls and 35 
were boys. She found that 37 of the students had 
already taken a driving test, whilst 5, including 3 
girls, were too young to take a driving test. Of 
the 37 who had taken a test, 16 boys and 8 girls 
had passed their test. The remainder, including 
6 girls, had failed their test. 


(a) Copy and complete the table. 


Boys Girls 
Passed driving test 16 8 
Taken driving test, but failed _ 6 
Learning, but not yet taken a : i 
driving test j 


Too young to take a driving test 


Use your table to find the probability that 


(b) a student chosen at random has failed a 
driving test, 

(c) agirl chosen at random has taken a driving \ 
test, i 

(d)_ a boy chosen at random has not yet taken a 
driving test, 

(e} 2 students, chosen at random, are both too 
young to take a driving test, 

(f) a boy anda girl, each chosen at random, | 
have both passed their driving test. (C} 

11. (a) Given that two events, A and B, are such 

that P(A and B) = P(A) x P(B), state what 
you can say about the events A and B. 
If event A is ‘obtaining a 6 on a single throw i: 
ofa die’, suggest a possible description for j 
event B. 


(b) Given that two events, Cand D, are 
such that P(C or D) = P(C) + P(D), state 
what you can say about the two events 
Cand D. 


Write down the value of P(C andD).  (C) 


12. The probability that a person in a particular 
evening class is left-handed is 1, From a class of 
15 women and 5 men a person is chosen at 
random. Assuming that ‘left-handedness’ is 
independent of the sex of a person, find the 
probability that the person chosen is a man or is 
left-handed. 


43. A and B are exhaustive events and it is known 
that P(A|B) = } and P(B) = 3. Find P(A). 


14. A bag contains four red counters and six black 
counters. A counter is picked at random from the 
bag and not replaced. A second counter is then 
picked. Find the probability that 
{a) the second counter is red, given that the first 

counter is red, 
(b) both counters are ted, 
(c) the counters are of different colours. 


15. Aand B are two independent events such that 
P(A) = 0.2. and P(B) = 0.15. 
Evaluate the following probabilities. 
{a) P(A|B), (b) P(AMB), (c) KAU B). (L) 
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16. Two events A and B are such that 

P(A) = &, PCB) = 4, PCAIB) =F. lea aa ea a 

Sanat the probabilities that e G 

‘a) both events occur, 19. Events A and B 
i only one of the two events occurs, P(B)= 0.25. If ree re ae es 
neither event occurs. (NEAB) find i eases 
Pi ‘ 4 r 

17. All the answers to this question should be given BS LENS eas ae 
either as fractions in their lowest terms or as 


decimals correct to three significant figures. 20 ae ene dice face bell aA 
, are thrown and the number i 
{a) A san drawa one ead at random from a each lands is noted. The score is the! Sime 
mp oe Pa ji playing cards, replaces two numbers. Find the probability that (a) the 
it and th ars another card at random score is even, given that at least one die lands on 
pack. a three, (b) at least one die lands on a three. 
Calculate the probability that given that the score is even. 


(i} both cards are clubs, 
(ii) exactly one of the cards is a Queen, 
(iii) the two cards are identical. 


21. Events C and D are such that P(C) = 4. 
a 4, P(C[D) = &. ° 
‘ind (a) (CN D 
(b) On another occasion the man draws ae MSE ate ee 
simultancously two cards at random from 


the pack of cards. 22. Two athletes, A and B, are attempting to qualify 


for an international competition i 

- petition in both th 
Calculate the probability that 5000 m and 10 000 m races. The probal bilities of 
fi uaideip ics Gr dara Oe each qualifying arc shown in the following table. 


(ii) the two cards are identical. {C) 


Athlete $000m 10000 | 
18. (a) The probability that an event A occurs is : 


P(A) = 0.4, B is an event independent of A A 2 i 
and the probability of the uni a i 
Se incon y e union of A and B B a ce ae 

Find P(B). Assuming that the probabilities are independent, | 
ib) Cand D are sitet ite such Gk calculate the probability that ; 

»eD|C) = Fand P(C|D) = i. (a) athlete A will qualify for both races, 

Given dat BO Dym6, expend ia dec (b) pop aa of the athletes qualifies for the 

race, 
(c) both athletes qualify only for the 10 000 m 


(i) P(C), (ii) P(D). sate: o 


PROBABILITY TREES 


A useful way of tacklin: ili i 
Al g many probabilit ili 
eo ee yer ae ity problems is to draw a probability tree. The method 


Example 3.24 


In a certain selection of flower seeds ; have been treated to i ove germin: 3 nave 
fl ds 3h b dt i 
3 mpr g ination and 3 ha 
nh iert untreated. € seeds Whic: ave been treater ave a probability or germu On Oo: 
been left untreated. Th cd: hich t bi treated | f nati 
0.8, whereas the untreated seeds have a probability of germination of 0.5 


(a) Find the probability that a seed, selected at random, will germinate. 


The seeds were sown and given time to germinate. 


(b) Find the probability that a seed selected at random had been treated, given that it had 


germinated. 
(L) 


——e 
RS a TT 
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Solution 3.24 


Events : Soo 
T: seed is treated P(T) = 3, R(T) = 5 
G: seed germinates P(G | T)=0.8, P(G| T’) = 0.5 


P(T and G) = P(T) x P(G|T) = 3x08 


P(T and G') = P(L) x P(G'|T) = 3% 0.2 


P(T' and G) = P(T) x P(G|T') = $x 0.5 


P(T' and G’) = P(T’) x P(G' |T’) = 4x05 


Treated or not Germinates or not 


How to use the tree: 


(i) Multiply the probabilities along the branches to get the end results, so for the first 
outcome, use the fact that P(T and G) = P(T) x P(G given T) aa : 
(ii) On any set of branches that meet at a point, the probabilities must ada up to 


0.8+0.2=1 


iii) Check that all the end results add up to 1. ; 
ie rie aewet any questions find the relevant end results. If more than one satisfy the 


requirements, add these end results together. 


In practice you would usually label your tree more simply as follows. 


a P(ENG)=3x0.8 i 
0g 
is, ee 

ve UP 


PIT’ VG) =4% 0.5 


T 


(TAG) =} x 0.2 


Treated Germinates 
or not or not 


(a) P(G)=P(TNG)+P(T'NG) 
2x 0.844x 0.5 


| 0.7 
P(T and G 
(b) P(T, given oe eres qtatked!® shove 
2 2x 0.8 
0.7 


= 0.762 (3 d.p.) 


Example 3.25 


| A manufacturer makes writing pens. The manufacturer employs an inspector to check the 
| quality of his product. The inspector tested a random sample of the pens from a large batch 
‘ and calculated the probability of any pen being defective as 0.025. 


Carmel buys two of the pens made by the manufacturer. 


(a) Calculate the probability that both pens are defective. 
(b) Calculate the probability that exactly one of the pens is defective. (C) 


Solution 3.25 
| D: a pen is defective, P(D) = 0.025, P(D') = 1 ~ 0.025 = 0.975, 
i 2 P(DND)=0.025 x 0.025* 


P(D ND’) = 0.025 x 0.975 


> P(D'N_D) =0.975 x 0.025 


First pen Second pen 


(a) P(both pens are defective) = P(D MD) cee 
= 0.025 x 0.025 
= 0.000 625 
(b) P{exactly one pen is defective) = P(D NM D') + P(D'N D) 
= 0,025 x 0.975 + 0.975 x 0.025 
= 0.048 75 


Example 3.26 


Events X and Y are such that P(X’) = 3, P(Y|X’) = 3, P(Y' |X) = }. 
By drawing a tree diagram, find 


(a) PCY) (b) P(X’ Y) 
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Solution 3.26 


Draw a tree diagram, showing event X followed by event Y, and write in all the given 
probabilities. Then work out the missing probabilities using the fact that probabilities on all 
the branches from a point add up to 1. 


oe 
x 


Beck 
400-5 
a: 
2 
(b) P(X'| Y) x P(Y) = P(Y|X") x P(X’) 
P(X'|Y. pee 
( IY)x5=5 5 
2 
IXY) == 
Alternatively 
xy _ Po and Y) 
P(X'[Y)= PM) 
3 1 
gee 
_s 3 
a 
2 
_2 
§ 


Example 3.27 
When a person needs a minicab, it is hired from one of three firms, x, Y and Z. Of the hirings 
40% are from X, 50% are from Y and 10% are from Z. For cabs hired from X, 9% arrive 
late, the corresponding percentages for cabs hired from firms Y¥ and Z being 6% and 20% 
respectively. Calculate the probability that the next cab hired 
(a) will be from X and will not arrive late, 
(b) will arrive late. 
Given that a call is made for a minicab and that it arrives late, find, to three decimal places, 


the probability that it came from Y. (L) 


| 
| 
| 
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Solution 3.27 


Events Probabilities 
X: cab is from X P(X) = 0.4 
Y: cab is from Y P(Y)=0.5 


Z: cab is from Z 
L: cab is late 


P(Z)=0.1 
P(L|X)=0.09, P(L|Y)=0.06, P(L|Z)=0.2 


oe  POXAL) 20.4 x 0.09 0.036 


& P(X NL) =0.4 x 0.91 = 0.364* 


L P(YAL) =0.5 x 0.06 = 0.03V 


“ PIYNL) =0.5 x 0.94 = 0.47 


P(ZNL)=0.1 x 0.2 =0.02 


& P(IZNL') =0.1 x 0.8 = 0.08 
(a) P(from X and not late) = P(X N L’) = 0.364 
(b) P(arrives late) = P(X and late) + P(Y and late) + P(Z and late) 
= P(X NL) +P(YNL) +P(ZNL) — shaded in diagram 
= 0.036 + 0.03 + 0.02 
= 0.086 


The possibility space is now reduced to the outcomes when the cab arrives late, where 
P(L) = 0.086 (part b) 


<— *on diagram 


PCY and late) 


P(from Y given it was late) = P(late) 
ate 


ie. P(Y|L) 


BAYES’ THEOREM 


P(Y|L) is easy to find from the tree diagram once you realise that the sample space has been 
reduced to the outcomes in which L occurs. This is a useful method when you want to ‘reverse 
the conditions’, as in Example 3.27, when you know P(L| Y} and you wanted P(Y|L). 
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It is interesting to write out the full formulae used: (a) Find the probability that the second question is answered correctly, 


PUY and L) = P(L|Y) x PY) {b) By extending the tree diagram, or otherwise, find the probability that the second question 
416 PCY and L) = P(Y| L) x P(L) is answered correctly given that the third question is answered correctly. (C) 


so P(Y|L) x P(L) = P(L| Y) x P(Y) Solution 3.28 


¢; 


S 


_ PLY) x PCY) 


P(Y|L)= PL) P(C, A C,) = 0.8 x 0.7= 0.56 


But P(L) =P(XAL)+P(YNL)+P(ZNL) 
=P(L|X)x P(X) + P(L|Y)x PUY) + P(L|Z) x P(Z) 


& 


P(C, A W,) = 0.8 x 0.3 = 0.24 
7 P(LLY) x PCY) 
so PIYIL)= Sax PER) + PELL x PO) + PULTZ)xP@) 


na 


P(W,AC,) =0.2 x 0.8 = 0.16 


x 


vA 


This is an example of Bayes’ Theorem, which can be written in general format as follows: 


Fori=1,2,3,..,# 


KALB ela) by and We PCW Wy) = 0.2 x 0,2 = 0.04 

B= pe AYE P(B[A, x P(Ay) + + PBL An) x PA,) 

‘The formula has been included here for reference. It is however easier to work from the 

format (b) 
P(A, and B) 

PB) 


especially when you have a tree diagram to illustrate the situation! 


(a) P(2nd question answered correctly) = 0.56 + 0.16 = 0.72 


cy P(C, AC, A Cy) 20.8 x 0.7 x 0.6 = 0.336 * 


P(A,|B) = 


Ws 


c, P(E, AW, A C,) = 0.8 x 0.3 x 0.7 = 0.168 


\ 
l\ 


_ 


i 


Example 3.28 


A computer program generates random questions in arithmetic that children have to answer 
within a fixed time. The probability of the first question being answered correctly is 0.8. 
Whenever a question is answered correctly, the next question generated is more difficult, and 
the probability of a correct answer being given is reduced by 0.1. Whenever a question is 
answered wrongly, the next question is of the same standard, and the probability of a correct 
answer being given remains unchanged. The following tree diagram shows this information 
for the first two questions generated. 


W3 


a7 c, P(W, AC, VC) = 0.2 x 0.8 x 0.7 = 0.112 * 


N 


Ws 


/ 


as & P(W, AW, Cy) = 0.2 x 0.2 x 0.8 = 0.032 


Ist question 2nd question Pe 
ist 2nd 3rd ms 
Correct rc 
P(C, | C,) = tae aan 
Correct P(C3) 4 SI 
0.336 + 0.112 


Wrong 


0.336 + 0.168 + 0.112 + 0.032 
_ 0.448 

Correct ~ 0.648 

= 0.69 (2 s.f.) 


Wrong 


Wrong 
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Exercise 3d Tree diagrams 


Section A 


1. The probability that I am late for work is 0,05. 
Find the probability that, on two consecutive 
mornings, (a) Lam late for work twice, (b) Lam 
late for work once. 


2. A mother and her daughter both enter the cake 
competition at a show. The probability that the 
mother wins a prize is 4 and the probability that 
her daughter wins a prize is 3. 

Assuming that the two events are independent, 

find the probability that 

(a) either the mother, or the daughter, but not 
both, wins a prize, 

(b) at least one of them wins a prize. 


3, Ina restaurant 40% of the customers choose 
steak for their main course. If a customer 
chooses steak, the probability that he will choose 
ice cream to follow is 0.6. If he does not have 
steak, the probability that he will choose ice 
cream is 0,3. Find the probability that a 
customer picked at random will choose 
(a) steak and ice cream, 

(b) ice cream, 


4, A box contains six red pens and three blue pens. 
{a) A penis selected at random, the colour is 
noted and the pen is returned to the box. 
This procedure is performed a second, then a 
third time. Find the probability of obtaining 
(i) three red pens, . 
(ii) two red pens and one blue pen, in any 
order, 
(iii) more than one blue pen. 8 
(b) Repeat (a) but this time find the probabilities 
if, at each selection, the pen is not returned to 
the box. 


5. Mass-produced glass bricks are inspected for 
defects. The probability that a brick has air 
bubbles is 0.002. If a brick has air bubbles the 
probability that it is also cracked is 0.5 while the 
probability that a brick free of air bubbles is 
cracked is 0.005, What is the probability that a 
brick chosen at random is cracked? The 
probability that a brick is discoloured is 0.006. 
Given that discolouration occurs independently 
of the other two defects, find the probability that 
a brick chosen at random has no defects. (O@C) 


6. In each round of a certain game a player can 
score 1, 2 or 3 only. Copy and complete the table 
which shows the scores and two of the respective 
probabilities of these being scored in a single 
round. 


Score 1 23 


‘St. 
<a 


Probability 


TS aaa ae 


Draw a tree diagram to show all the possible 
total scores and their respective probabilities 
after a player has completed two rounds. 

Find the probability that a player has (a) a score 
of 4 after two rounds, (b) an odd number score 
after two rounds. (L Additional) 


The probability that I have to wait at the traffic 
lights on my way to school is 0.25. 


Find the probability that, on two consecutive — 
mornings, | have to wait on at least one morning. 


. Adie is thrown three times. What is the 


probability of scoring a two on just one occasion? 


. Acoin is tossed four times. Find the probability 


of obtaining tess than two heads. 


0. Two golfers, Smith and Jones, are attempting to 


11. 


12. 


qualify for a golf championship. It is estimated 
that the probability of Jones qualifying is 0.8, 
and that the probability of both Smith and Jones 
qualifying is 0.6. Given that the probability of 
Smith qualifying and the probability of Jones 
qualifying are independent, find the probability 
that only one of them will qualify. (C) 


Whether or not Jonathan gets up in time for 
school depends on whether he remembers to set 
his alarm clock the evening before. 

For 85% of the time he remembers to set the 
clock; the other 15% of the time he forgets. 

If the clock is set, he gets up in time for school 
on 90% of the occasions. 

If the clock is not set, he does not get up in time 
for school on 60% of the occasions. 


On what proportion of the occasions does he get 
up in time for school? (NEAB) 


In a game, a steel ball is dropped onto a set of 
nails arranged in three levels as shown. 

When a ball hits a nail, the probability of it 
moving right or left before reaching the next 
level is 4. 


(WON 


Calculate the probability of a ball 
(a) reaching A 
(b} reaching B, 


(c) dropping into slot C. (NEAB) 


13. A team needs to win at least two of its remaining 
three games to secure the championship. The 
probabilities that the team will win the games are 
assessed to be 0.6, 0.7 and 0.8, respectively. 
Calculate the probability, based on these assessed 
values, that the team wil! secure the 
championship. {C) 


14. In the game of tennis a player has two serves. 
If the first serve is successful the game continues. 


If the first serve is not successful the player serves 
again. If this second service is successful the 
game continues. 


If both serves are unsuccessful the player has 
served a ‘double fault’ and loses the point. 


Gabriella plays tennis. She is successful with 60% 
of her first serves and 95% of her second serves. 


(a) Calculate the probability that Gabriella 
serves a double fault. 


If Gabriella is successful with her first serve she 

has a probability of 0.75 of winning the point. 

If she is successful with her second serve she has 

a probability of 0.5 of winning the point. 

(b) Calculate the probability that Gabriella wins 
the point. (MEG) 


15. In a group of 12 international referees there are 
three from Africa, four from Asia and five from 
Europe. To officiate at a tournament, three 
referees are chosen at random from the group. 
Calculate the probability that 


(a) a referee is chosen from each continent, 

({b) exactly two referees are chosen from Asia, 

(c) the three referees are chosen from the same 
continent. (C) 


16. A bag contains seven black and three white 
marbles. Three marbles are chosen at random 
and in succession, each marble being replaced 
after it has been taken out of the bag. 


Draw a tree diagram to show all possible 
selections. 


From your diagram, or otherwise, calculate, to 

two significant figures, the probability of 

choosing 

(a) three black marbles, 

(b) a white marble, a black marble and a white 
marble in that order, 

{c) two white marbles and a black marble in 
any order, 

{d) at least one black marble. 


State an event from this experiment which 
together with the event described in (d) would be 
both exhaustive and mutually exclusive. (L) 


17. Alec and Bill frequently play each other in a series 
of games of table tennis. Records of the outcomes 
of these games indicate that whenever they play a 
series of games, Alec has the probability 0.6 of 
winning the first game and that in every 


18. 


subsequent game in the series, Alec’s probability 
of winning the game is 0.7 if he won the 
preceding game but only 0.5 if he lost the 
preceding game. A game cannot be drawn. Find 
the probability that Alec will win the third game 
in the next series he plays with Bill. (NEAB) 


Three men, A, B and C agree to meet at the 
theatre. The man A cannot remember whether 
they agreed to meet at the Palace or the Queen’s 
and tosses a coin to decide which theatre to go 
to. The man B also tosses a coin to decide 
between the Queen’s and the Royalty. The man 
C tosses a coin to decide whether to go to the 
Palace or not and in this latter case he tosses 
again to decide between the Queen’s and the 
Royalty. Find the probability that 

(a) Aand B meet, 

(b) Band C meet, 

(c) A, B and C all meet, 

(d) A, B and C all go to different places, 

(e) at least two meet. {C) 


Section B 


1. I travel to work by route A or route B. The 


probability that I choose route A is }. The 

probability that I am late for work if I go via 

route A is } and the corresponding probability if 

I go via route B is 5. 

(a) What is the probability that I am late for 
work on Monday? 

(b} Given that Iam late for work, what is the 
probability that I went via route B? 


A box contains 20 chocolates, of which 15 have 
soft centres and five have hard centres. Two 
chocolates are taken at random, one after the 
other. Calculate the probability that 


(a) both chocolates have soft centres, 

(b) one of each sort of chocolate is taken, 

(c) both chocolates have hard centres, given that 
the second chocolate has a hard centre. (C) 


(a) Explain in words the meaning of the symbol 
P(A|B) where A and B are two events. State 
the relationship between A and B when 
{i) P(A|B)=0, (ii) P(A | B) = P(A). 

(b} When a car owner needs her car serviced she 
phones one of three garages, A, B, or C. Of 
her phone calls to them, 30% are to garage 
A, 10% to B and 60% to C. 


The percentages of occasions when the 
garage phoned can take the car in on the 
day of phoning are 20% for A, 6% for B 
and 9% for C. 


Find the probability that the garage phoned 


will mot be able to take the car in on the day 
of phoning. 


Given that the car owner phones a garage 
and the garage can take her car in on that 
day, find the probability that she phoned 
garage B. (L) 
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4. A shop stocks tinned cat food of two makes, A 


and B, and two sizes, large and small. 


Of the stock, 70% is of brand A, 30% is of 
brand B. 


Of the tins of brand A, 30% are small size whiist 
of the tins of brand B, 40% are small size. 


Using a tree diagram, or otherwise, find the 
probability that 


(a) a tin chosen at random from the stock will 
be of small size, 

(b) a small tin chosen at random from the stock 
will be of brand A. (L) 


Adie is known to be biased in such a way that, 
when it is thrown, the probability of a six : 
showing is 4. This biased die and an ordinary fair 
die are thrown. Find the probability that 


(a) the fair die shows a six and the biased die 
does not show a six, , 

(b) at least one of the two dice shows a six, 

(c) exactly one of the two dice shows a six, 


given that at least one of them shows a ie 


. A golfer observes that, when playing a particular 
hole at his local course, he hits a straight drive 
on 80% of the occasions when the weather is not 
windy but only on 30% of the occasions when 
the weather is windy. Local records suggest that 
the weather is windy on 55% of all days. 


(a) Show that the probability that, ona 
randomly chosen day, the golfer will hit a 
straight drive at the hole is 0.525. 

(b) Given that he fails to hit a straight drive at 
the hole, calculate the probability that the 
weather is windy. (NEAB) 


. In my bookcase there are four shelves and the 
number of books on each shelf is as shown in the 


table: 
Hardback Paperback 
Shelf 1 11 9 
Shelf 2 8 12 
Shelf 3 16 4 
Shelf 4 - 9. 3 


(a) IfI choose a book at random, irrespective of 
its position in the bookcase, what is the 
probability that it is a paperback? 

(b) Tam equally likely to choose any shelf. I 
choose a shelf at random and then choose a 
book. (i) What is the probability that it is a 
hardback? (ii) If the book chosen is a 
hardback, what is the probability that it is 
from shelf 3? 


8, Of a group of pupils studying at A-level in 


schools in a certain area, 56% are boys and 44% 
are girls, The probability that a boy of this group 
is studying Chemistry is yand the probability 
that a girl of this group is studying Chemistry 
is ay 
(a) Find the probability that a pupil selected at 
random from this group is a girl studying 
Chemistry. ; 
(b) Find the probability that a pupil selected at 
random from this group is not studying 
Chemistry. : , 
(c) Find the probability that a Chemistry pupil 
selected at random from this group is male. 


(You may leave your answers as fractions in their 
lowest terms.) {O@C) 


. Explain, by suitably defining events Aand B, 


what is meant by ‘the probability of A occurring 
given that B has occurred’. 


A local greengrocer sells conventionally grown 
and organically grown vegetables. 


Conventionally grown vegetables constitute 80% 
of his sales; carrots constitute 12% of the 
conventional sales and 30% of the organic sales. 


Display this information in an appropriately and 
accurately labelled tree diagram. 


One day a customer emerges from the shop and 
is questioned about her purchases. What is the 
probability that she bought 


{a) conventionally grown carrots, 
(b) carrots? 


Given that she did buy carrots, what is the 
probability that they were organically grown? 
‘What assumptions have you made in answering 
this question? (O) 


10. Ina simple model of the weather in October, 


each day is classified as either fine or rainy. The 
probability that a fine day is followed by a fine 
day is 0.8. The probability that a rainy day is 
followed by a fine day is 0.4. The probability 
that 1 October is fine is 0.75. 


(a) Find the probability that 2 October is fine 
and the probability that 3 October is fine. 
(b) Find the conditional probability that 
3 October is rainy, given that 1 October is 
fine. 
{c) Find the conditional probability that 
1 October is fine, given that 3 October is 
rainy. (C) 


41. At the ninth hole on a certain golf course there is 


a pond. A golfer hits a grade B ball into the 
pond. Including the golfer’s ball there are then 
six grade C, ten grade B and four grade A balls 
in the pond. The golfer uses a fishing net and 
‘catches’ four balls. The events X, Y and Z are 


12. 


13. 


14. 


defined as follows: 


X: the catch consists of two grade A balls and 
two grade C balls 

Y: the catch consists of two grade B balls and 
two other balls 

Z: the catch includes the golfer’s own bali 


Assuming that the catch is a random selection 
from the balls in the pond, determine 

(a) P(X), (b) PCY), (c) P(Z), (d) P(Z| Y). 

For each of the pairs X and Y, Y and Z, state, 
with a brief reason, whether the two events are 
(i) mutually exclusive, (ii) independent. (C} 


[In this question, give your answers in decimal 
form, correct to three significant figures.] 


A choir has seven sopranos, six altos, three 
tenors and four basses. The sopranos and altos 
are women and the tenors and basses are men. 
Ata particular rehearsal, three members of the 
choir are chosen at random to make the tea. 


(a) Find the probability that all three tenors are 
chosen, 

(b} Find the probability that exactly one bass is 
chosen. 

(c) Find the conditional probability that two 
women are chosen, given that exactly one 
bass is chosen. 

(d) Find the probability that the chosen group 
contains exactly one tenor or exactly one 
bass (or both). (C) 


Vehicles approaching a crossroads must go in 
one of three directions — left, right or straight on. 
Observations by traffic engineers showed that of 
vehicles approaching from the north, 45% turn 
left, 20% turn right and 35% go straight on. 
Assuming that the driver of each vehicle chooses 
direction independently, what is the probability 
that of the next three vehicles approaching from 
the north 


(a) all go straight on, 

(b) all go in the same direction, 

(c) two turn left and one turns right, 
(d} ail go in different directions, 

(e) exactly two turn left? 


Given that three consecutive vehicles all go in the 
same direction, what is the probability that they 
all turned left? (AEB) 


During an epidemic of a certain disease a doctor 
is consulted by 110 people suffering from 
symptoms commonly associated with the 
disease. Of the 110 people, 45 are female of 
whom 20 actually have the disease and 25 do 
not. Fifteen males have the disease and the rest 
do not. 


(a) A person is selected at random. The event 
that this person is female is denoted by A 
and the event that this person is suffering 


15. 


16. 


17. 


from the disease is denoted by B. 
Evaluate {i) P(A), (ii) P(A U B), 
(iii) P(ANB), (iv) P(A|B). 

(b) If three different people are selected at 
random without replacement, what is the 
probability of (i) all three having the disease, 
(ii) exactly one of the three having the 
disease, (iii) one of the three being a female 
with the disease, one a male with the disease 
and one a female without the disease? 

(c) Of people with the disease 96% react 
positively to a test for diagnosing the disease 
as do 8% of people without the disease. 
What is the probability of a person selected 
at random (i) reacting positively, (ii) having 
the disease given that he or she reacted 
positively? (AEB) 


In an experiment two bags A and B, containing 
red and green marbles are used. Bag A contains 
four red marbles and one green marble and bag 
B contains two red marbles and seven green 
marbles. An unbiased coin is tossed. If a head 
turns up, a marble is drawn at random from bag 
A while if a tail turns up, a marble is drawn at 
random from bag B. Calculate the probability 
that a red marble is drawn in a single trial. Given 
that a red marble is selected, calculate the 
probability that when the coin was tossed a head 
was obtained. (L) 


In a computer game played by a single player, 
the player has to find, within a fixed time, the 
path through a maze shown on the computer 
screen. On the first occasion that a particular 
player plays the game, the computer shows a 
simple maze, and the probability that the player 
succeeds in finding the path in the time allowed 
is 3. On subsequent occasions, the maze shown 
depends on the result of the previous game. If the 
player succeeded on the previous occasion, the 
next maze is harder, and the probability that the 
player succeeds is one half of the probability of 
success on the previous occasion. If the player 
failed on the previous occasion, a simple maze is 
shown and the probability of the player 
succeeding is again }. 


The player plays three games. 


(a) Show that the probability that the player 
succeeds in all three games is #5. 

(b) Find the probability that the player succeeds 
in exactly one of the games. 

(c) Find the probability that the player does not 
have two consecutive successes. 

(d) Find the conditional probability that the 
player has two consecutive successes given 
that the player has exactly two successes. (C} 


A sailing competition between two boats, A and 
B, consists of a series of independent races, the 
competition being won by the first boat to win 
three races. Every race is won by either A or B, 
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and their respective probabilities of winning are Given that the first race was won by A, 
influenced by the weather. In rough weather the determine the conditional probability that 
probability that A will win is 0.95 in fine weather (a) the weather for the first race was rough, 
the probability that A will win is 0.4. For each (b) A will win the competition. 


race the weather is either rough or fine, the 
probability of rough weather being 0.2. Show 
that the probability that A will win the first race 
is 0.5. 


SOME USEFUL METHODS 


(a) Problems involving an ‘at least’ situation 


Example 3.29 


(a) Find the probability of obtaining at least one six when five dice are thrown. 

(b) Find the probability of obtaining at least one six when 1 dice are thrown. 

(c) How many dice must be thrown so that the probability of obtaining at least one six is at 
least 0.99? 


Solution 3.29 


(a) In one throw P(6) = & and P(not 6) = 5 
When five dice are thrown, 


P(at least one six) = 1- P(no sixes) 
=1-@ 
= 0.598 (3 dip.) 


(b) When x dice are thrown, 
P(at least one six) = 1—- (" 


(c) You need to find # such that 


4-(D" > 0.99 
ie. @)" < 0.01 


You could do this by trial and improvement: 


(5)?9 = 0.026... > 0.01 
(3) = 0.0104 ... > 0.01 
(2)6 = 0.0087 ... < 0.01 


So the least value of 1 is 26. 
26 dice must be thrown. 


NOTE: you could solve 8)" < 0,01 using logarithms. 
Take logs to the base 10 of both sides, 
n log() < log(0.01) 
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Divide both sides by log(2). Since log(2) is negative, this will reverse the inequality sign. 
. log(0.01) 
log() 
n>25.3... 


The least value of 1 is 26 


(b) aoe involving the use of an infinite geometric progression 


Many probability examples involve the use of GPs and the following formula is required. 
If S.,=a+ar+ ar? +ar> +--- (to infinity), 


then 


a 
se : ; ; 
“7-7 for |r|<1 where a is the first term and r is the common ratio 


Example 3.30 


Joe and Pete play a game in which they each throw a die in turn until someone throws a six 


The person who th ix wi : “ 
its a5 " n who throws the six wins the game. Joe starts the game. Find the probability that 


Solution 3.30 


Joe will win the game if he wins on his fi i is thi 
ee g n his first go, or on his second go, or on his third go, and so 
P(Joe wins on his first go) = } 
P(Joe wins on his second go) = P(Joe doesn’t throw a six, Pete doesn’t throw a six. 
> 
then Joe throws a six) 
JSySy 1 Uys 
HEXEXG = (XE 
P F thi aSySySy5 
(Joe wins oe third go) =x %xéx3x} = (3)*x and so on 
ga) dg (52 5¢4 
P(Joe wins)=§ + (2)? x§ + ()*x@) + = 
i 
=P +Ot+-) 


Now 1+()?+ (34 +++ is the sum of an infinite GP with a= 1, r = (3)? = 3%. 
a 1 > ; 


S..= = = 36 | ing) wh 6362S 
eo gg. que gs ake? P(Joe wins) = § x it = 77 


36 


Exercise 3e Useful methods 


1. Acoin is biased so that the probability that it 
falls showing tails is 0.75. 


(a) Find the probability of obtaining at least one 
head when the coin is tossed five times. 

(b) How many times must the coin be tossed so 
that the probability of obtaining at jeast one 
head is greater than 0.98? 


2, A missile is fired at a target and the probability 
that the target is hit is 0.7. 


(a) Find how many missiles should be fired so 
that the probability that the target is hit at 
least once is greater than 0.995. 

(b) Find how many missiles should be fired so 
that the probability that the target is not hit 
is less than 0.001. 


3. Adie is biased so that the probability of 
obtaining a three is p. When the die is thrown 
four times the probability that there is at least 
one three is 0.9375. Find the value of p. 


How many times should the die be thrown so 
that the probability that there are no threes is 
less than 0.03? 


4. Ona safe there are four alarms which are 
atranged so that any one will sound when 
someone tries to break into the safe. The 
probability that each alarm will function 
properly is 0.85, find the probability that at least 


one alarm will sound when someone tries to 
break into the safe. 


5. For a certain strain of wallflower, the probability 
that, when sown, a seed produces a plant with 
yellow flowers is {. Find the minimum number of 
seeds that should be sown in order that the 
probability of obtaining at least one plant with 
yellow flowers is greater than 0.98. (L) 


6. Two people, A and B, play a game. An ordinary 
die is thrown and the first person to throw a four 
wins. A and B take it in turns to throw the die, 
starting with A. Find the probability that B wins. 


ARRANGEMENTS 


7. A,B, Cand D throw a coin, in turn, starting 
with A. The first to throw a head wins. The 
game can continue indefinitely until a head is 
thrown. However, D objects because the others 
have their first turn before him. 


Compare the probability that D wins with the 
probability that A wins. 


A box contains five black balls and one white 


ball. Alan and Bill take turns to draw a bail from 
the box, starting with Alan. The first boy to 
draw the white ball wins the game. 


‘Assuming that they do not replace the bails as 
they draw them out, find the probability that Bill 
wins the game. 


If the game is changed, so that, in the new game, 
they replace each ball after it has been drawn 
out, find the probabilities that: 

(a) Alan wins at his first attempt; 

(b) Alan wins at his second attempt; 

(c) Alan wins at his third attempt. 


Show that these answers are terms in a 
Geometric Progression. Hence find the 
probability that Alan wins the new game. 


|. ‘Two archers A and B shoot alternately at a 


target until one of them hits the centre of the 
target and is declared the winner. 


Independently, A and B have probabilities of 
4 and 4, respectively, of hitting the centre of the 
target on each occasion they shoot. 


(a) Given that A shoots first, find (i) the 
probability that A wins on his second shot, 
(ii) the probability that A wins on his third 
shot, (iii) the probability that A wins. 

(b) Given that the archers toss a fair coin to 
determine who shoots first, find the 
probability that A wins. (NEAB) 


In order to calculate the number of possible outcomes in a possibility space or an event, the 


following results are often used. 


Result 1 


The number of ways of arranging # unlike objects in a line is 7! 


NOTE: nt =n x (1-1) x (2-2) x. x3 x 2x1 


For example, consider the letters A, B, C, D. 


The first letter can be chosen in four ways (either A or B or C or D) 
the second letter can be chosen in three ways p 
the third letter can be chosen in two ways. , 

the fourth letter can be chosen in only one way. 


Therefore the number of ways of arranging the four letters is 4 x 3x 2x 1=4!=24 


On a calculator: 4 x!] (You may have to use |SHIFT| key.) 


The arrangements are 


ABCD ABDC ACBD ACDB AD 

CB ADBC 
BCDA BCAD BDAC BDCA BACD BADC 
CDBA CDAB CABD CADB CBAD CBDA 
DABC DACB DBCA DBAC DCAB DCBA 


Example 3.31 


- yecenal reported that a car seen speeding away from the scene of the crime had a number 
Pp ep at ce with V or W, the digits were 4, 7 and 8 and the end letters were A, C, E. He 
pee not however remember the order of the digits or the end letters. How many car would 
need to be checked to be sure of including the suspect car? 


Solution 3.31 


There are 3! ways of arranging the digits 4, 7, 8 and 
3! ways of arranging the letters A, C, E. 
There are two choices for the initial letter. 
The total number of different plates = 2 x 3! x 3! 
=72 
72 cars would need to be checked. 


Result 2 


" t ~ ay Of way: arr? a + { } : f 
The number of ways of arranging in a line # objects, of which p are alike, is = 


If instead of the letters A, B, C, D 
f , B, C, D you have the letters A, A. 
listed previously reduce to the following: je Metre area 


AAAD AADA ADAA DAAA 


So the number of ways of arranging the four objects, of which three are alike 


_4! 4x3x2x1 
Sa ayoNE On a calculator: |4] Ix!] J}+] [3] [xt] |= 


The result can be extended as follows: 


The ; ae reqs arr. H ; ‘ = 7 
Fhe number of ways of arranging in a line 1 objects of which p of one type are alike, g of a 


Sec me are alike. + , ; i : 
second type are alike, r of a third type are alike, and so on, is " 


pigitl .. 


208 A CONCISE COURSE IN A-LEVEL STATISTICS 


Example 3.32 
(a) In how many ways can the letters of the word STATISTICS be arranged? 
(b) If the letters of the word MINIMUM are arranged in a line at random, what is the 
probability that the arrangement begins with MMM? 


Solution 3.32 
(a) Consider the word STATISTICS. 


There are ten letters and S occurs three times, 
T occurs three times, 


lL occurs twice. 
10! 


Therefore number of ways = BNI 50 400 
Te] (3) [st] Le} 13d =| [2 xt] [=] 


g the letters in the word STATISTICS. 


Onacalculator: [10] |x! 
There are 50 400 ways of arrangin; 
(b) Consider the word MINIMUM. 


‘The possibility space S = (arrangements of MINIMUM). 
7 
n(S) = Sui 420 
TT 
3Ms 21s 


Let E be the event ‘the arrangement begins with MMM”. 


The letters must be arranged in the order MMMxxxx. There is only one way of arranging 


4! 
MMM; then the remaining four letters can be arranged in a7 12 ways. 


n(E) = 12 
wD Se 
So PIE) =") = 420" 35 


The probability that the arrangement begins MMM is 3s- 


Example 3.33 
Ten pupils are placed at random in a line. What is the probabilit 
pupils are separated? 


y that the two youngest 


Solution 3.33 
Let the possibility space be S, then (S) = 10! 
Let E be the event ‘the two youngest pupils are together’. 
Treating these two together as one item, there are nine items to arrange. 


Nine items can be arranged in 9! ways. 
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The two youngest can be arranged in two ways (Y, Y, or Y, Y,). 
Therefore 2(E) =2 x 9! 
E) 
s 02) 
Oo P(E) aS) Onacalculator: |2] |x] {9 xt} [=| {10} [xt] |= 
2x9 
10! 
=0.2 
E' is the event ‘the two youngest are not together’. 
P(E) = 1-P(E) 
=1-02=0.8 
The probability that the two youngest are separated is 0.8. 


Example 3.34 
If a four-digit number is formed from the digi iti 
! gits 1,2, 3 and 5 
find the probability that the number is divisible by 5. a a aa 
Solution 3.34 


Let S be the possibility space, then n(S) = 4! = 24. 
Let E be the event ‘the number is divisible by 5’. 
If the number is divisible by 5 then it must end with the digit 5. 


n{E) = number of ways of arranging the digits 1, 2, 3 = 3! 


The probability that the number is divisible by 5 is 4. 


Example 3.35 


The letters of the word MATHEMATICS i 
Te ee are written, one on each of 11 separate cards. The 


(a) Calculate the number of different arrangements of these letters. 

(b) Determine the probability that the vowels are all placed together. {L} 
Solution 3.35 

(a) Number of different arrangements 


ll Sg 
2x2 xt 989 600 


(b) To find the number of ways with vowels together treat as one item. So treat 
M,T, H, M,'T, C, $ and as 8 items. 
! 


8 
Then number of arrangements =—4)5) ~ 10 080 


a cS 


. 
2Ms 27s 

A 
can be arranged in yl = 12 ways, 


so total number of arrangements = 12. x 10 080 = 120 960 
120 960 
= = 0.024 (2 s.f.) 


~ 4989 600 


The vowels A, E, A, J, however, 


P(vowels together) 


Example 3.36 
The six letters of the word LONDON are each written on a ¢ 


shuffled and placed in a line. 


(a) Calculate the number of different arrangements. 


(b) Find the probability that the middle two cards bi 


(c) Find the probability that the two cards with letter 


letter N are also adjacent. 
d again and placed in a line, 
I the letters L and O. 


ard and the six cards are then 


oth have the letter N on them. 
O are adjacent and the two cards with 


The cards are shuffle face down. The first two cards in the line are 

turned over and revea 

(d) Find the probability that when the other fou! 
LONDON. 


Solution 3.36 


6! 
(a) Number of different arrangements of LONDON = axl = 180 
wa SS 


208 2 


(b) If the middle two letters are NN, then you need to find the number of different 


arrangements of LODO. 
4! 
Number of arrangements = oy =12 
20s 
122 «1 


P(middle two letters are NN) {80° 15 


r cards are turned over the letters will spell 
(L) 


(c) If two Os and two Ns are adj i 
acent th i i i i 
peaches tae mets jacent then it is easier to think of each pair being glued 
Number of different arrangements of L, @, NN, D = 4! = 24 
24 2 


P(two Os, two Ns are adjacent) = —~=— 
180 15 


(d) If the first two cards are L, O i 
oe e L, O then you need to find the number of different arrangements 
Number of arrangements = = a12 
2! 


2 Ns 
It is quite easy to list these arrangements 


*NDON DNNO 
NDNO DNON 
NODN DONN 
NOND ONND 
NNDO ONDN 
NNOD ODNN 


Of course, only one of these marked (*) will spell LONDON 


So P(L, O and four remaining letters spell LONDON) = => 
12 


Result 3 
The number of , : 
e x of ways of arre 1 : : 
pense y of arranging 7 unlike objects in a ring when clockwise i : 
arrangements are different is ( — 1)! ckwise and anticlockwise 


For exam; le, consider four people A, B, and D, who are to be seated at a rounc table. The 
ip. peop: Cc A 
following four arr angements are the same, as A always has D on his immediate r. ight and B on 


A D c : 
D 
( } { } ( } { } 
c B rs ° 


oO Tine e number of differen’ i ider 
fi d th - b : d t arrangements, fix A and then consid he number of ways 0. 


Therefi i 
erefore the number of different arrangements of four people around the table is 3! 
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Example 3.38 
Result 4 One white, one blue, one red and two yellow beads are threaded on a ring to make a bracelet. 


> y ar S$ i i i obability that the red and white beads are next to each other. 
ing like objects in a ing, when clockwise and Find the pr bab; y 
The number of ways of arranging # unlik jects 2» 


(=D! | : 
anticlockwise arrangements are the same, 18 ——5—~ \ Solution 3.38 
: ‘ Let S be the possibility s " 
For example, if A, B, C and D are four different coloured beads which are ee a ah i ey 4! 
> 2D, + C iewed from the i is . x . ‘ : : 
then the following two arrangements are the same, since one is the other vie | If all the objects are unlike, the ene of ways of arranging five beads on a ring is —, but 
‘ : t 
other side. since there are two yellows, 2(S) =~ =6 
A A (2)21) 
Let E be the event ‘the red and the white beads are next to each other’. 
8 i 
B bd D ine red and white can be arranged in 2! ways 
: 
2131 —— number of ways of arranging four objects in a ring 
Then n(E) = —— : ve ° 
Cc c i ci —— anticlockwise and clockwise arrangements are the same 
3! ' 
ing is — = ~~ there are two yellows 
‘Therefore the number of arrangements of four beads on a ring is 2 3. | : : 
So n{E)=3 
mE) 3 1 
and ee aS 
Example 3.37 nS) 6 2 


Six bulbs are planted in a ring and two do not grow. What is the probability that the two that 


The probability that the red and white beads are next to each other is }. 
do not grow are next to each other? 


This result can be shown diagrammatically: 
Solution 3.37 


Ways of arranging the beads 
Let S be the possibility space, then nS) = S! 


Ww Ww a 
, 
Let E. be the event ‘the bulbs that do not grow are next to each other’. ss 
F ey , , 
Consider the two bulbs that do not grow as one item. They can be arranged in 2! ways. 
ive i i i this can be done in 4! ways. 
There are now five items to be arranged in a ring and ' \ 


Therefore nlE) = 214! 


ee w y 8 
So ~ nS) ; a a ; 
2141 
"st 
2 R Y R Y 7 é 
“S 


NOTE: as expected, in three of the six arrangements the red and white beads are next to each 


h other is 2 other, 
The probability that the bulbs that do not grow are next to each other 3s 5. 
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PERMUTATIONS OF r OBJECTS FROM n OBJECTS 


Consider the number of ways of placing three of the letters A, B, C, D, E, F, G in three empty 
spaces. 


The first space can be filled in seven ways. The second space can be filled in six ways. The 

third space can be filled in five ways. Therefore there are 7x 6x5 ways of arranging three 

letters taken from seven letters. This is the number of permutations of three objects taken 

from seven and it is written 7P,. 

So 7pP,=7x6x5=210 

Now 7x 6x5 ldb . 7Tx6x5Sx4X3x2x1 
ow 7x 6x5 cou e written Fd 
7! 7! 

“ar (7-3)! 


Ona calculator this can be obtained directly: | 7 f" P| 13] (= 


ie. 7P; 


(you may have to use the shift key.) 

NOTE: the order in which the letters are arranged is important since ABC is a different 
permutation from ACB. 

In general, the number of permutations, or ordered arrangements, of r objects taken from 7 
unlike objects is written "P,, where 


“Ge! 


n} nt 
NOTE: Using the formula, "P, 


“Gm! 0! 
But the number of ways of arranging 7 unlike objects is 7! 
So 0! is defined to be 1, i.e. 

Ole 1 


‘Try it on your calculator. 


COMBINATIONS OF r OBJECTS FROM n OBJECTS 


When considering the number of combinations of r objects from 1 objects, the order in which 
they are placed is not important. 


For example, the one combination ABC gives rise to 3! permutations 
ABC, ACB, BCA, BAC, CAB, CBA 


Denoting the number of combinations of three letters from the seven letters A, B, C, D, E, F, 
G, by 7C; then 


7C,x 3!=7Ps 
P 7 
eae TC 


On the calculator, 
7C, can be obtained directly: [7 | ec) 13) |=) (vou may have to use the shift key-) 


In general, the number of combinations of r objects 4 ‘ 5 re 
g ‘al, the £ ro ¥ © panes Malte ere 
a $ OF F Objects . Sct 
: o u ke objects is C, wh 
heres 
a—7)! 


NOTE: "C, is sometimes written ,,C, or ("} 


Example 3.39 


In how many ways can a hand of four cards be dealt from an ordinary pack of 52 playing 


cards? 


Solution 3.39 | 


You need to consi inati i 

sider combinations, since thi i i 

. . i 
cae . order in which the cards are dealt is not 


$2C,=270 725 Onacalculator: |52] |"C,| | 4 | = | 


The number of ways of dealing the hand of four cards is 270 725. 


Example 3.40 | 


Four letters are chosen at random from th i 
ee ee m the word RANDOMLY. Find the probability that all | 


Solution 3.40 
Let S be the possibility space, then n(S) = *C, = 70 


Let E ‘ 
be the event ‘four consonants are chosen’. Since there are six consonants 


n(E) = °C, = 15 
ry Pi 
nS) 70 14 


The probability that the four letters chosen are consonants is 7). 


Example 3.41 


A team of four is chosen at random from five girls and six boys 
(a) In how many ways can the team be chosen if 


(i) there are no restrictions; 
(ii) there must be more boys than girls? 


(b) Find the probability that the team contains only one boy. 
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Solution 3.41 
(a) (i) There are 11 people, from whom four are chosen. The order in which they are chosen 
is not important. 
Number of ways of choosing the team = NC, = 330 
If there are no restrictions, the team can be chosen in 330 ways. 


(ii) If there are to be more boys than girls, then there must be three boys and one girl, or 


four boys. 
Number of ways of choosing three boys and one girl = ©C, x °C, = 100 
Ona calculator: | 6 Fc, 3 = x [5 f GC, | 1 = 


Number of ways of choosing four boys = fo.=15 


So number of ways of choosing three boys and a girl, or four boys = 100+1 


52115. 
Number of ways to choose the team with more boys than girls = 115. 


(b) The possibility space S = (all possible teams of four) 


n(S) = 330. 
Let E be the event ‘only one boy is chosen’. If one boy is chosen, then three girls must be 
chosen, 
so nlE) = 8C, x 5C,= 60 
n(E) 60 2 


( =o 


= 749) 330 11 
The probability that the team contains only one boy is 7. 


Example 3.42 


If a diagonal of a polygon is defined to be a line joining any two non-adjacent vertices, how 
many diagonals are there in a polygon of (a) five sides, (b) six sides, (c) 2 sides? 


Solution 3.42 
(a) Number of ways to choose two points from five = 5C, = 10 


5! 
5G, =—— 
Note °C, = a3 
5x4 
= & 


a <——~ 3! cancels on the top and on the bottom 


So there are ten possible lines to draw, but as there are five sides, five of these are joining 


adjacent vertices. 
5x4 


2 
‘The number of diagonals for a polygon with five sides is 5. 


-5=5. 


number of diagonals = °C, —- 5 = 


PROE 
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(b) Similarly, for the polygon of six sides, 
the number of diagonals = °C, - 6 
_ 6x5 


2 
=9 


The number of diagonals for a polygon with six sides is 9. 
(c) For a polygon with x sides, 


the number of diagonals = "C, — 2 


_an-1) 
iar aie 
_w—n-2n 
RA peed 
_ nn-3) 
—— 


The number of diagonals for a polygon with 1 sides is n(n— 3) 
2 


Example 3.43 


Three letters ar ‘ 
pa are chosen at random from the word BIOLOGY. Find the number of possible 


Solution 3.43 


You need to find the total i 
number of selections and, because th i 
| ere are tw 
number of selections with , ee rs 


no letters O 


one letter O 
two letters O. 


Number of selections without the letter O 


= ue of ways to choose three letters from B, I, L, G, Y 
Sones. 
=10 


Number of selections with one letter O (e.g. O,B,I , O,B,G , and so on) 


= aa of ways to choose two letters from B, I, L, G, Y 
= <2 
=10 


Number of selections with two letters O (e.g. O,O,B , O,O,G , and so on) 


= aoe of ways to choose one letter from B, I, L, G, Y 


Therefore, total number of selections = 10+ 10+ 5=25 


iu Exercise 3f Arrangements, permutations, combinations 
ed ‘ eck: 
NOTE: it is easy to write them all out to ¢ 1. In how many ways can the letters of the word 12, Ina group of six students, four are female and 
FACETIOUS be arranged in a line? two are male. Determine how many committees 
BLL oO, B, 1 O, O, B What is the probability that an arrangement of three members can be formed containing one 
1, eens Thies 
0, O,1 begins with F and ends with $? male and two females. (L) 
B,1,G 0, B, L 
a. L Yy 0, B, G 0,0,L 2. {a) In how many ways can seven people sit at a 13. Four persons are chosen at random from a group 
2% Y 0, 0,G round table? of ten persons consisting of four men and six 
BL, G O, B, ie (b} What is the probability that a husband and women. Three of the women are sisters. 
BLY O,1,L 0,0, ¥. wife sit together? Calculate the probabilities that the four persons 
aa hi ill: 
B,G,Y O,4, G 5 3. Ona shelf there are four mathematics books and crisis be 
9 Ie eight English books {a) consist of four women, 
LLG 0,1, ¥ e e : (b) consist of two women and two men, 
racks O.L.G {a) Ifthe books are to be arranged so that the {c) include the three sisters. (NEAB) 
LL, Y bits mathematics books are together, in how 
LG,Y 0, L,Y many ways can this be done? 14, A touring party of 20 cricketers consists of nine 
ae 0. G, Y {b) What is the probability that all the batsmen, eight bowlers and three wicket keepers. 
1,G6,Y ae mathematics books will not be together? A team of 11 players must have at least five 
10 batsmen, four bowlers and one wicket keeper. 
re 4. The letters of the word PROBABILITY are How many different teams can be selected, (a) if 
arranged at random. Find the probability that all the players are available for selection, (b) if 
seccmreonnreeeincnnecerana the two Is are separated. two batsmen and one bowler are injured and 
5. If the letters in the word ABSTEMIOUS are cannot play? 
arranged at random, find the probability that the 5, Bind the number of ways in which ten different 
Example 3.44 owels and consonants appear alternately. , a4 aol i 
B.C, D, E, F, G, are thoroughly shuffled and then dealt out face y PPI y Does te ee eer 4 bo eer if 
Seven cards, labelled A, B, C, Ds ®s © \> 6. Nine children play a party game and hold hands 
upwards on a table. in a circle. 16. Four letters are picked from the word 
=) Deed Cs ett t form, that {a) In how many different ways can this be done? BREAKDOWN. What is the probability that 
Find the probabilities, giving each as a fraction in its simples 3 {b) What is the probability that Mary will be there is at least one vowel among the letters? 
in th der. holding hands with her friends Natalie and 
: hi ds to appear are the cards labelled A, B,C, int rat order, Sarah? 17. Eight people sit in a minibus: four on the sunny 
ee e the cards labelled A, B, C, but in any order, side and four on the shady side. If two people 
(b) the first three cards to appear are. | order: A, B, C, D, E, F, G. (NEAB) 7. (a) In how many different ways can the letters want to sit on opposite sides to each other and 
(c) the seven cards appear in their original order: , 5 by 1 1% in the word ARRANGEMENTS be another two people want to sit on the shady side, 
arranged? in how many ways can this be done? 
(b) Find the probability that an arrangement rai 4 ene 
chosen at random begins with the letters EE. 18. Disco lights are arranged in a vertical line. How 
Solution 3.44 many different arrangements can be made from 
f seven 8. From a group of ten boys and eight girls, two two green, three blue and four red lights (a) if all 
(a) Number of ways to arrange ieee letters oP pupils are chosen at random. Find the nine lights are used, (b) if at least eight lights are 
7 probability that they are both girls. used? 
=7P,=—=7x6x5=210 
B Al 9. From a group of six men and eight women, five 19. A group consisting of 10 boys and 11 girls 
Bh th der) = people are chosen at random. Find the attends a course for special games coaching. 
P( first three letters are A, B, C in that or 210 probability that there are more men chosen than (a) When they are introduced, each person 
‘wemen: hands a card containing his or her 
Pareney . photograph and name and address to every 
(b) Number of ways to choose three letters from seven 10. Sie ‘e bag cociainits six white pris and other member of the group. State the total 
7\ aa Find the sbabili Barn ae ‘hite at number of cards which are exchanged. 
=’C,= =a 35 poner “4 tne Bice aprity that aes white (b) 5 boys are selected for basketbail and 6 girls 
413! foupters aud: two bile counters are chosen. for netball. Find the number of different 
. : der) =—— 1. Fr ft le, 5 axéto be possible selections for each of these. 
P(first three letters are A, B, C in any orce! ) 35 chosen: 2 aerde aa Fentee ee {c) 5 particular boys and 5 particular girls are 
. selected and placed in mixed pairs for 
(a) In how many different ways can the tennis, Find the total number of different 
(c) Number of ways to arrange seven letters = 7! = 5040 is rere ame nee mixed pais which can be made using these 
4 : ne 10 children, 
P(A, B, C, D, E, F, G)= 5040 rade ee ven ‘te vi both the (d) If 4 children are chosen at random from the 
> b > usband an ce Wile WL e Chosen. 


{c}) Find the probability that the three youngest 
people will be chosen. 


whole group find the probability that there is 
a majority of girls in the 4 selected. 
(L Additional) 
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20. 


21. 


22 


23, 


24, 


A competition has a first prize, a second prize, a 
third prize and a fourth prize. Ten competitors 
enter this competition and the prizes are awarded 
for the first, second, third and fourth competitors 
in order of merit. 


(a) Find the number of different ways in which 
these prizes could be won. 


Smith and Jones are two of the ten competitors. 
Find the number of different ways in which the 
prizes could be won if 


(b) neither Smith nor Jones wins a prize, 
(c) each of Smith and Jones wins a prize.  (C) 


The number of applicants for a job is 15. 
Calculate the number of different ways in which 
six applicants can be selected for interview. 


‘The six selected applicants are interviewed ona 
particular day. Calculate the number of ways in 
which the order of the six interviews can be 
arranged. 

Of the six applicants interviewed, three have 
backgrounds in business, two have backgrounds 
in education and one has a background in 
recreation. Calculate the number of ways in 
which the order of the six interviews can be 
arranged, when applicants having the same 
background are interviewed successively. {C) 


Each of seven children, in turn, throws a ball 
once at a target. Calculate the number of ways 
the children can be arranged in order to take the 
throws. 


Given that three of the children are girls and four 
are boys, calculate the number of ways the 
children can be arranged in order that 


(a) successive throws are made by boys and 
girls alternately, 

(b) a girl takes the first throw and a boy takes 
the last throw. (Cc 


To enter a cereal competition, competitors have 
to choose the eight most important features ofa 
new car, from a possible 12 features, then list the 
eight in order of preference. Each cereal packet 
entry form contains space for five entries. A 
correct entry wins a new car. 


{a) What is the probability that a woman wins a 
new car if she completes the entry form 
from one packet? 

(b) How many entry forms would she need to 
complete, each entry showing different 
arrangements, if the probability that she 
wins a car is to be at least 0.8? 


Three letters are selected at random from the 
word SCHOOL. Find the probability that the 
selection (a) does not contain the letter O, 

(b) contains both the letters O. 


25 


26. 


. How many even numbers can be formed with the 
digits 3, 4, 5, 6, 7 by using some or all of the 
numbers (repetitions are not allowed)? 


OOOO 


Different coloured pegs, each of which is 
painted in one and only one of the six colours 
red, white, black, green, blue and yellow, are to 
be placed in four holes, as shown in the figure, 
with one peg in each hole. Pegs of the same 
colour are indistinguishable. Calculate how 
many different arrangements of pegs placed in 
the four holes so that they are all occupied can 
e made from 

a) six pegs, all of different colours, 

(b) two red and two white pegs, 

) two red, one white and one black peg, 


c 
d)_ twelve pegs, two of each colour. 
(L Additional) 


27. (a) Calculate how many different numbers 


altogether can be formed by taking one, 
two, three and four digits from the digits 
9, 8, 3 and 2, repetitions not being 
allowed. 

(b) Calculate how many of the numbers in part 
(a) are odd and greater than 800. 

(c) If one of the numbers in part (a) is chosen at 
random, calculate the probability that it will 
be greater than 300. {L Additional) 


28. The positions of nine trees which are to be 


planted along the sides of a road, five on the 
north side and four on the south side, are shown 
in the figure. 


O° re) fe) fe) oO N 


O° te) ie} 1) 8 


(a) Find the number of ways in which this can 
be done if the trees are all of different 
species. 


(b) If the trees in (a) are planted at random, find 


the probability that two particular trees are 
next to each other on the same side of the 
road. 

(c) If there are three cupressus, four prunus and 
two magnolias, find the number of different 
ways in which these could be planted 
assuming that trees of the same species are 
identical. 


(d) If the trees in (c) are planted at random, find 


the probability that the two magnolias are 
on the opposite sides of the road. 


(L Additional) 


29. A committee consisting of six persons is to be 
selected from five women and six men. 


(a) Calculate the number of ways in which the 
chosen committee will contain exactly two 


men. 


(b) Given that the committee is to contain at 
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(ii) Given that the committee consists of 
three men and three women and that 
the men and women must sit alternately 
round the table, calculate in how many 
different ways they may be seated. 

(L Additional) 


least two men, show that it can be selected 


in 456 ways. 


(c) Given that these 456 ways are equally likely. 
calculate the probability that there will be : 
more men than women on the committee. 

{d) Ata meeting the members of the chosen 
committee sit at a rectangular table in the 


fixed seats illustrated in 


30. A committee of eight members consists of one 
married couple together with four other men and 
two other women. From the committee a 
working party of four persons is to be formed. 
Find the number of different working parties 
which can be formed, 


the diagram: Find also the number if the working party 


oO an (a) may not contain both the husband and his 


wife, 
(b) must contain two men and two women, 


(c) must contain at least one man and at least 
one woman. 


o 


The eight committee members sit round an 


{i) Given that each may sit in any of the 
six places, calculate the number of 
different ways they may be seated at the 


table. 


Summary 
e Experimental probability 


P(A)= lim (| 


n>. \ 7 


e. Equally likely outcomes 


A 
P(A) = oe where 
0<P(A)<1 


P(A’) = 1— P(A) where A’i 


e For events A and B 


P(A or B) = P(A) + P(B) — 


octagonal table, their positions being decided by 
drawing lots. Find the probability of 


(d) the man sitting next to his wife, 
(e) the man sitting opposite to his wife, 
(f) the three women sitting together. (AEB) 


te : : 
where 3° the relative frequency of A. 


n(A) is the number of outcomes in A 
n(S) is the number of possible outcomes. 
If A is impossible, P(A) = 0 : : 
If A is certain, P(A) = 1. 


is the event ‘A does not occur.’ 


P(A and B) 


P(A U B) = P(A} + P(B) — P(A B) 


For mutually exclusive events A and B, P(A MB) =0 
‘so P(A or B) = P(A) + P(B) 


‘or’ rule for exclusive events 


ie. P(A UB) = P(A) + P(B) 


~ For exhaustive events A and B 


P(A or B)=T, i.e. P(AUB)H=1 


=e 
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Miscellaneous worked examples 


@ Conditional probability 


P(A and B) Example 3.45 
P(A given B) = “Spy ] A die is biased so that, when it is rolled, the probability of obtaining a score of 6 is 4. The 
P(A NB) i probabilities of obtaining each of the other five scores 1, 2, 3, 4, 5 are all equal. Calculate the 
je. P(A[B)= Pe probability of obtaining a score of five with this biased die. 
P(A and B) = P(A|B)P(B) = P(B| A)P(A). | {a} ee bai aca die are now rolled together. Calculate the probability that 
For ind dent events A; B i (b) The two dice are rolled again. Given that the total score is 11 or more, calculate the 
@2 2Or meee ' probability that the score on the biased die is 6. (C) 
P(A|B)= P(A) | 
P(B] A) = P(B) Solution 3.45 
P(A and B) = P(A).x P(B) ‘and’ rule for independent events | és 
| vents 
o, Tree diagrams (Multiply along the branches) | 6,: score 6 on biased die 6: score 6 on unbiased die 


2 P(A 1B) = PIA) x FB IA) 5p: score 5 on biased die Sy: score 5 on unbiased die 
For the biased die, P(63) =} 


A | . P(score is 1,2, 3,4 or 5) =3 
| 1.3. 3 
; B’) = P(A) x P(B'| A) P (5,;)=—x—=— 
a! PAB’) = P(A) x P(B'| (S)==%5" 55 
/ (a) P(11 or more) = P(6,6y)* + P(6,5y)* + P(536y) 
| dt * 54 ad 3 1 
B P(A‘ B)= P(A) x P(BL A) A ib 4°6 * 206 
1f1 1 3 
a" =s|/-+—+— 
—_ 6|4 4. 20 
| 
# P(A‘ Bt) = P(A’) x PLB’ A) | -5 x x 
P(B) = P(A 0B) + P(A‘ NB) B 
@ Arrangements, permutations and combinations | 120 
— "The number of ways of arranging 7 unlike objects in a line nt fi Bee alconretrtdeer'mere) P(6, and score is 11 or more) 
_ ‘The number of ways of arranging in a line 1 objects of which p of na : P(score is 11 or more) <—— jnarked * above 
one type are alike, q of another type are alike, r of a third type are plait = 1 3 A " 1 7 ded: 
i 4 6 4 6 12 10 
alike, and so on = = = 
= "The number of ways of arranging 7 unlike objects in a ring when (nt)! 1B 13 13 
clockwise and anticlockwise arrangements are different | 120 120 
_ ‘he number of ways of arranging 7 unlike objects in a ring when (n—1t i 
clockwise and anticlockwise arrangements are the same 2: Example 3.46 
: ; nt . “ F 2 
Sf. tations of ¢ objects taken from 7 unlike ap 2 During 1996 a vet saw 125 dogs, each suspected of having a particular disease. Of the 
ite nuns Pa (ent 125 dogs, 60 were female of whom 25 actually had the disease and 35 did not. Only 20 of 
objects : : nt the males had the disease, the rest did not. The case history of each dog was documented 
‘The number of combinations of r objects taken from n unlike "C 


(= AG on a separate record card. 
objects : 


LIT 


PRO 
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(a) A record card from 1996 is selected at random. Let A represent the event that the dog | Solution 3.47 

sae a Calan Ms female and B represent the event that the dog referred fas That pot San beallocated ta 8 posdbie wave, 

0. Was SUUEIINE i The second post can be allocated in 7 possible ways. 
The third post can be allocated in 6 possible ways. 


Find 
(i) P(A), | Number of allocations = 8 x 7 x 6 = 336 
(ii) P(A UB), : (b) Number of different sets of three officers = °C, = 56 
(iii) P(A MB), (c) If both the Browns are chosen, 
(iv) P(A |B). number of ways to choose third representative = 6 
(b) if three different record cards are selected at random, without replacement, find the So P(both Browns are chosen) = Be es 
probability that i 56 28 
(i) all three record cards relate to dogs with the disease, i 
(ii) exactly one of the three record cards relates to a dog with the disease, i 
(iii) one record card relates to a female dog with the disease, one to a male dog with the | Example 3.48 
disease and one to a female dog not suffering from the disease. ) ; A factory has three machines A, B, C producing large numbers of a certain item. Of the total | 
Solution 3.46 | daily production of the item, 50% are produced on A, 30% on B and 20% on C. Records | 
olution 3. | show that 2% of items produced on A are defective, 3% of items produced on B are defective 
Summarising the information in a table: and 4% of items produced on C are defective. The occurrence of a defective item is 
independent of all other items. 
Diseased (B) Not Diseased (B') Total Sackett ak ae 
ne item is chosen at random , 
Female (A Fe 5 a | om from a day’s total output. 
Male (A') 20 45 65 | i Show that the probability of its being defective is 0.027. 3 
Toul 4s a are | (b) Given that it is defective, find the probability that it was produced on machine A. (W) 


i Solution 3.48 


(a) (i) P(A) = fy = 0.48 
(ii) P(AUB)= 25 ae 20 _ ox 0.64 Events are defined as follows 
vs _ A: Item produced on A P(A)=0.5  P(D, given A) = 0.02 
(iii) P(A 9 B) = 735 = 0.2 BI ee 
(iv) P(A|B) = 2525 :Item produced on B- P(B)=0.3 P(D, given B) = 0.03 
; . 7 7 C: Item produced onC P(C)=0.2 P(D, given C) = 0.04 
(b) (i) PCBBB) = #85 qha * tg = 0.045 (2 s.£.) | D: Item is defective 
(ii) Number of ways of arranging B, B', B'=3 : 
So P(BB'B' in any order) = 3 x AS x £0 x By = 0.44 (2 8.f.) i 
(iii) Number of ways of arranging the cards = 3! i wee D » P(DNA)=0.02 x 0.5 = 0.01.) * 
So P(female with disease, male with disease, female without disease) ; 
= 31x 25x 2 x B= 0.055 (2 sf.) a ee 
1.98 D 
Example 3.47 0.03 0 -P(D NB) =0.03 x 0.3 = 0.009 
A company needs to appoint three representatives, one to be based in Lancashire, one in : os 
Yorkshire and one in Cumbria. There are eight sales officers available for selection to the post : 
of representative. i os i 
(a) Calculate the number of possible allocations of officers to representative posts. 
(b) Calculate the number of different sets of three officers who could be appointed to D P(DAC)=0.04 x02 0.008 
represent the company. 
(c) Two of the eight sales officers are members of the Brown family. Assuming that the three 
representatives are chosen at random from the eight officers, find the probability that both ye 
members of the Brown family will be chosen. Give your answer as a fraction in its o 
(NE. AB) Machine Defective Items 


simplest form. 
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(a) P(D) = P(D and A) + P(D and B) + P(D and C) 
= 0.01 + 0.009 + 0.008 
= 0.027 
(b) You already know that P(D given A) = 0.02, but now you need to ‘reverse the conditions’ 
to find P(A given D) 
P(A and D) «> maried * 
P(D) 
_ 0.01 
0.027 found it 
= 0.370 (3 dp.). 


on tree 


Use P(A given D) = 


Example 3.49 
A house is infested with mice and to combat this the householder acquired four cats, Albert, 
Belinda, Khalid and Poon. The householder observes that only half of the creatures caught are 
mice. A fifth are voles and the rest are birds. : 


20% of the catches are made by Albert, 45% by Belinda, 10% by Khalid and 25% by Poon. 


(a) The probability of a catch being a mouse, a bird or a vole is independent of whether or 
not it is made by Albert. What is the probability of a randomly selected catch being a 


(i) mouse caught by Albert, 
(ii) bird not caught by Albert? 


(b) Belinda’s catches are equally likely to be a mouse, a bird or a vole. What is the probability 
of a randomly selected catch being a mouse caught by Belinda? 

(c) The probability of a randomly selected catch being a mouse caught by Khalid is 0.05. 
What is the probability that a catch made by Khalid is a mouse? 

(d) Given that the probability that a randomly selected catch is a mouse caught by Poon is 0.2 
verify that the probability of a randomly selected catch being a mouse is 0.5. 

(e) What is the probability that a catch which is a mouse was made by Belinda? (AEB) 


Solution 3.49 


Events Probabilities 

M: a mouse is caught P(M) = 0.5 

V: a vole is caught P(V) = 0.2 

B: a bird is caught P(B) =1-(0.5 + 0.2) = 0.3 
A: Catch by Albert P(A) =0.2 

L: Catch by Belinda P(L) = 0.45 

K: Catch by Khalid P(K) =0.1 

N: Catch by Poon P(N) = 0.25 


(a) (i) P(Mouse caught by Albert) = P(MN A) 
= P(M)x P(A) Mi: 
=0.5 x 0.2 
=01 


are independent 
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(ii) P(Bird not caught by Albert) = P(BN A‘) 
= P(B) x P(A’) —— B and At are independent 
=0.3x 0.8 
= 0.24 


Before answering the next parts, it is useful to show all the given information on a tree diagram: 


P(Mn A) =0.1 « ~ from pare {a}it} 


P(MNL) = 0.15) ——~ from part * (b) 


m  P(MNK) =0.05 


B 
ee dk P(MNN)=0.2 
Sea 


B 


(b) P(Mouse caught by Belinda) = P(M  L) 
=0.45x4 
= 0.15 


(c) P(Catch is mouse caught by Khalid) = P(M nm K) =0.05 


P(Catch by Khalid is a mouse) = P(M | K) 
_ P(MNK) 
~ PCR) 

_ 0.05 


0.4 
=0.5 


(d) P(Catch is mouse caught by Poon) = 0.2 
P(Catch is a mouse) = P(MN A) + P(MN L) + P(M NK) +P(MAN) 
= O01 + 015 + 005 + 0.5 
= 0.5 


(e) P(Catch which is a mouse was caught by Belinda) = P(L | M) 
_ P(LAM) 


SS RTT TTT 


SRR 
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Miscellaneous exercise 3g 


1, Each time a table tennis player serves, the 


probability that she wins the point is 0.6, 
independently of the result of any preceding 
serves. At the start of a particular game, she 
serves for each of the first five points. Calculate 
the probability that, for the first two points of 
this game, 


(a) she wins both points, : 
(b) she wins exactly one of these two points. 


Calculate the probability that, for the first five 
points of this game, 


(c) she loses all five points, , 
(d) she wins at least one of these five points. (C) 


| A director of a company is selected at random. 


C denotes the event that the director’s annual 
salary is more than £300 000. 

C’ denotes the event that the director’s annual 
salary is not more than £300 000. 

D denotes the event that the director’s annual 
salary is less than £200 000. 

E denotes the event that the director’s annual 
salary is less than £350 000. 


Write down two of the events C, C', D and E 
which are 


(a) complementary, ; 
(b) mutually exclusive but not exhaustive, 


(c) exhaustive but not mutually exclusive. 
AEB) 


. Newborn babies are routinely screened for a 
serious disease which affects only two per 1000 
babies. The result of screening can be positive or 
negative. A positive result suggests that the baby 
has the disease, but the test is not perfect. If a 
baby has the disease, the probability that the 
result will be negative is 0.01. If the baby does 
not have the disease, the probability that the 
result will be positive is 0.02. 


(a) Find the probability that a baby has the 
disease, given that the result of the test is 
positive. 

(b) Comment on the value you obtain. {L} 


. A penalty shoot-out in a game of hockey requires 
each of two players to take a penalty hit'to try to 
score a goal. In a simple model, each player has a 
probability of 0.8 of scoring a goal, and 
independence is assumed. Calculate the probability 
that exactly one goal is scored from the two hits. 


In an alternative model, the probability of the 
second player scoring is reduced to 0.7 if the first 
player does not score. Calculate the probability 
that the second player has scored, given that only 
one goal is scored. {C) 


5, Forty 17- and 18-year old students are the only 


people present at a party. The numbers of male 
and female students of each age are given in the 
following table. 


17-year old. 18-year old 


Male 9 13 
Female 7 11 


In the Grand Draw, each of the forty students 
has an equal chance of winning one of two 
prizes. The first prize is a gift token and the 
second prize is a box of chocolates. No student 
may win more than one prize. Find the 
probability that 


(a) the gift token will be won by an {8-year old 
male student, 

{b) both prizes will be won by female students, 

(c)_ the box of chocolates will be won bya 
17-year old student, given that the gift token 
is won by a 17-year old male student. (C) 


_ Each customer at a supermarket pays by one of 


cash, cheque or credit card. The probability of a 
randomly selected customer paying by cash is 
0.54 and by cheque is 0.18. 


(a) Determine the probability of a randomly 
selected customer paying by credit card. 


Three customers are selected at random. 
(b) Find the probability of 
(i) all three paying by cash, 
(ii) exactly one paying by cheque, 
(iii) one paying by cash, one by cheque and 
one by credit card. 


The probability that the amount payable excceds 
£30 is 0.26. If the amount payable does exceed 
£30, then the probability of it being paid by 
cheque is 0.28. 


{c) Find the probability that a randomly 
selected customer pays more than £30 and 
pays by cheque. 

(d) Hence find the probability that a randomly 
selected customer pays more than £30, given 
that the customer pays by cheque. (AEB) 


. Awriter submits a poem for publication by a 


literary magazine. The poem will be accepted for 
publication if it is approved by at least two of the 
three members of the editorial staff who 
independently assess it. Given that the 
probabilities that the poem is approved by the 
three members are 0.9, 0.7 and 0.6 respectively, 
find the probability that the poem is not 
accepted. 


10. 


dt, 


The writer submits a different poem for each of 
three separate issues of the magazine. Given that 
the probabilities remain the same, calculate the 
probability that all three of her poems are 
accepted. (C} 


. Atan art exhibition seven paintings are to be 


hung in a row along one wall. Find the number 
of possible arrangements. 


Given that three paintings are by the same artist, 
find the number of arrangements in which 


{a} these three paintings are hung side by side, 

(b) any one of these three paintings is hung at 
the beginning of the row but neither of the 
other two is hung at the end of the row. (C) 


. A group of three pregnant women attend 


ante-natal classes together. Assuming that each 
woman is equally likely to give birth on each of 
the seven days in a week, find the probability 
that all three give birth 


(a) ona Monday, 

(b) on the same day of the week, 

(c) on different days of the week, 

(d) at a weekend (either a Saturday or Sunday). 

(e) Find the probability of all three giving birth 
on the same day of the week given that they 
all give birth at a weekend. 

(f) How large would the group need to be to 
make the probability of all the women in the 
group giving birth on different days of the 
week Jess than 0.05? (AEB) 


The probability that for any married couple the 
husband has a degree is ; and the probability 
that the wife has a degree is }, The probability 
that the husband has a degree, given that the 
wife has a degree, is 15. 


A married couple is chosen at random. 
Find the probability that 


(a) both of them have degrees, 
(b) only one of them has a degree, 
(c) neither of them has a degree. 


Two married couples are chosen at random. 


(d) Find the probability that only one of the two 
husbands and only one of the two wives 
have a degree. {L) 


A personal stereo system consists of a playing 
unit and a headphone unit. Each unit is tested for 
faults. If a unit is found to be faulty, an attempt 
is made to correct the fault and the unit is then 
retested. Any unit that is found to be faulty a 
second time is rejected. 


(a) The probability of a randomly chosen 
playing unit being found to be faulty on the 
first test is 0.1. If a second test is needed, the 
probability of a playing unit being found to 
be faulty on the second test is 0.05. 


42. 


13. 
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{i) Calculate the probability that a 
randomly chosen playing unit is 
rejected. 

(ii) Given that a playing unit is accepted, 
calculate the probability that a fault was 
found on the first test. Give your answer 
correct to three significant figures. 


(b) The probability of a randomly chosen 
headphone unit being found to be faulty on 
the first test is 0.04. If a second test is 
needed, the probability of a headphone unit 
being found to be faulty on the second test is 
0.02. Calculate the probability that a 
randomly chosen headphone unit is 
accepted. Give your answer correct to three 
significant figures. 

(c) A randomly chosen playing unit that has 
been accepted and a randomly chosen 
headphone unit that has been accepted are 
combined to make a personal stereo system. 
Calculate the probability that at least one of 
the two units has been retested, Give your 
answer correct to three significant figures. 

{C) 


A bag contains four red counters, three blue 
counters and three green counters. A counter is 
drawn at random from the bag and not replaced. 
A second counter is then drawn at random from 
the bag. 


Assuming that at each stage each counter left in 
the bag has an equal chance of being drawn, 


(a) find the probability, giving your answers as 
ractions in their lowest terms, that the 
second counter will be blue given that 


(i) the first counter is red, 
(ii) the first counter is blue, 
(iii) the first counter is green. 


(b) Find the probability, giving your answer as a 
fraction in its lowest terms, that the first 
counter will be red and the second counter 
will be blue. 
(c) Find the probability, giving your answer as a 
raction in its lowest terms, that the second 
counter will be blue regardless of the colour 
of the first counter. (C) 


A particular firm has six vacancies to fill from 
15 applicants. Calculate the number of ways in 
which these vacancies could be filled if there are 
no restrictions. 


The firm decides that three of the six vacancies 
shall be filled by women and three by men. The 
applicants consist of seven women and eight men. 
Calculate the number of ways in which the six 
vacancies could be filled under these conditions. 


One of the seven women is the wife of one of the 
eight men. Calculate the number of ways in 
which three women and three men could fill the 
six vacancies, given that both the wife and her 
husband are among those appointed. 
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14. 


15. 


Of all the possible selections of three women and 
three men, one is picked at random. Calculate 
the probability that this selection includes 


{a) both the wife and her husband, 
(b) either the wife or her husband, but not both. 
(C) 


Laura has 12 friends, seven girls and five boys, 
all of whom she wants to come to her birthday 
party. However, she is only allowed to invite five 
of them. Not wishing to show any favouritism, 
Laura chooses the five children to come to the 


party at random. 

(a) How many different selections are possible? 

(b) In how many selections are there exactly 
three boys? 

(c) What is the probability that exactly three 
boys are invited to the party? 


In fact, there are three girls at the party, 
including Laura, and three boys, including Liam 
and John, For the party tea they sit round a 
circular table, equally spaced, with Laura sitting 
in the position shown in the diagram. 


12) 13) 


Laura 


12) 12) 


(d) In how many different ways can the other 
children fill the remaining seats? 


‘With Laura sitting in her place, the other 
children take their seats at random. 


(e) Find the probability that Laura sits next to 
Liam and John. 

(f) Find the probability that boys and girls sit 
alternately. (MEI) 


A draw is being made for the quarter-finals of a 
knock-out table tennis tournament. Fight 
counters, alike in every respect except that they 
are numbered from one to eight inclusive, are 
placed in a bag and drawn one by one, without 
replacement. A typical draw might produce the 
numbers in the order 3, 5, 7, 2, 1, 8, 6 4, 
resulting in the matches: 


Match A 3 plays S 
Match B 7 plays 2 
Match C 1 plays 8 
Match D 6 plays 4 


(a) In how many different orders can the 
counters be drawn from the bag? 

{b) In how many ways can the counters be 
drawn such that 


(i) players 1 and 2 play each other in match 


A, 
(ii) players 1 and 2 play each other. 


16. 


(c) Find the probability that 
(i) players 1 and 2 play each other, 

(ii) players 1 and 2 do not play each other. 

Tn fact, players 1, 2, 3 and 4 are girls and the rest 

are boys. 

(d) In how many ways can the counters be 
drawn such that the girls play each other in 
matches A and B and the boys play each 
other in matches C and D? 

{e) What is the probability that no girl plays a 
boy in the quarter-finals? (MEI) 


Ina set of 28 dominoes each domino has from 0 
to 6 spots at each end. Each domino is different 
from every other and the ends are 
indistinguishable so that, for example, the two 
diagrams in figure 1 represent the same domino. 


Fig. 1 
‘A domino which has the same number of spots 
at each end, or no spots at all, is called a 
‘double’. A domino is drawn at random from the 
set. Figure 2 shows a sample space diagram to 
represent the complete set of outcomes, each of 
which is equally likely. 

A 


Qenwoaan 


0123456 


Fig. 2 Fig. 3 

Let the event A be ‘the domino is a double’, 

event B ‘the total number of spots on the domino 

is six’ and event C be ‘at least one end of the 

domino has five spots’. 

Figure 3 shows the sample space with the event 

A marked. 

{a) Write down the probability that event A 
occurs. 

(b) Find the probability that either B or C or 
both occur. 

(c) Determine whether or not events A and B 
are independent. 

{d) Find the conditional probability P(A |). 
Explain why events A and C are not 
independent. 


After the first domino has been drawn, a second 

domino is chosen at random from the remainder. 

(ce) Find the probability that at least one end of 
the first domino has the same number o! 
spots as at least one end of the second 
domino. 
[HINT: Consider separately the cases where 
the first domino is a double and where it 1s 
not.] (ME 
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1. A club social committee consists of eight people. 
two of whom are Nicky and Sam. Two of the : 
eight committee members are to be chosen at 
random to organise the next club disco. 


First Choice Second Choice 


Sam 


Other 


aw 
— 


Other 
Nicky 
Sam 

Other 


By considering the above tree diagram, or 

otherwise, 

(a) find the probability that both Nicky and 
Sam are chosen, 

(b) find the probability that both Nicky and 
Sam are chosen, given that at least one of 
Nicky and Sam is chosen. (C) 


2. A bag contains ten balls, of which four are red 


and six are blue. An experiment consists of 
drawing at random and without replacement 
three balls, one at a time, from the bag. 


{a) Draw a tree diagram to show all the possible 
outcomes of the experiment. 


Hence, or otherwise, find the probability that 


(b) the first two balls drawn will be of different 
colours, 

(c) the third ball will be red, 

(d)_ the third ball will be red, given that the first 
two balls drawn were both blue. (L) 


. Ann, Barry and Clare are three students taking a 
multiple choice examination paper. For each 
question a student has to select the correct 
answer from five that are offered. For 
Question 1, Ann has no idea of the correct 
answer, Barry correctly identifies one answer 
that is wrong and Clare correctly identifies two 
wrong answers. All three students decide to guess 
at random from the answers they think stand a 


oe of being correct. Calculate the probability 
that 


(a) none of the three students chooses the 
correct answer, 


{b) Clare is the only one to choose the correct 


answer, 
(c) exactly one of the three students chooses the 
correct answer. (NEAB) 


4. Last year the employees of a firm either received 
no pay rise, a small pay rise or a large pay rise. 
The following table shows the number in each 
category, classified by whether they were weekly 
paid or monthly paid. 


No. Small Large 

pay. rise... pay. rise pay rise 
Weekly. paid 25 85 5 
Monthly paid 4 8 23 


A tax inspector decides to investigate the tax 
affairs of an employee selected at random. 


D is the event that a weekly paid employee is 
selected. 


E is the event that an employee who received no 
pay rise is selected. 


D' and E' are the events “not D” and “not E” 
respectively. 

Find 

(a) P(D), 

(b) P(DUE), 

{c) P(D'NE’. 


Fis the event that an employee is female. 


(d) Given that P(F’) = 0.8, find the number of 
female employees. 
(e) Interpret P(D | F) in the context of this 
question. 
(f) Given that P(D n F) = 0.1, find P(D | F). 
(AEB) 


. The captain of a darts team is trying to arrange 
an evening match for next Monday, Tuesday, 
Wednesday or Thursday. He hopes that the 
leading players, A, B, C and D, will all be free on 
one of these evenings. In fact each of the four 
players has arranged an engagement for exactly 
one of the four evenings. 


Assuming that each player is equally likely to 
have chosen any one of the four evenings, an 
that their choices are independent, find the 
probability that 
{a) A and B have both chosen Monday evening, 
(b) either C or D {or both) has chosen Monday 
evening, 
(c) the four players have chosen four different 
evenings, 
(d)_ there will be at least one evening when all 
four players are free. (NEAB) 
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Mixed test 3B 


4, A coin is biased so that, on each toss, the 4, A school has three photocopiers A, Band C. On 
probability of obtaining a head is 0.4. The coin is any given day the independent probabilities ofa 
tossed twice. breakdown are 0.1 for A, 0.05 for B and 0.04 

(a) Calculate the probability that at least one for C. 

head is obtained. 
(b) Calculate the conditional probability that Hora err chosen day, calculate the 

f S fi probability that 
exactly one head is obtained, given that at . 
feast one head is obtained. (C) (a) at least one of the copiers breaks down, 
(b) exactly one of the copiers breaks down, 
2. The probabilities of events A and B are P(A) and (c)_ given exactly one copier breaks down, then 

P(B) respectively. it is copier C. (NEAB) 


P(A) = §, P(A 1B) = }, PAU B)=4- Probabili istri j H : 
: ; Se eer fre nh From ty distributions | — discrete variables 


(a) P(B) past results it seems that in years when the 
j Ramblers win, the probability of them winning 


Find, in terms of q, 


(b) P(A|B). the next i i i 
year is 0.7 and in years when the In this chapt 

Given that A and B are independent events, Strollers win, the probability of them winning, plier you will learn 

(c) find the value of (L) the next year is 0.5. It is not possible for the quiz 

: to result in the scores being tied. e about Maids eee et 
probability dis: i : 

3. A questionnaire asks shareholders of a company The Ramblers won the quiz in 1996. h y distributions for discrete random variables 

to state whether they consider the chairman's i ; © how to calculate and use £(X), the expectati 

1 b high, ab igh 1 (a) Draw a probability tree diagram for the ’ ion (mean) 

salary to be too high, about right, or too low. three years up to 1999. e how to calculate and use E(g(X)), the expectation of a si . 

Excluding shareholders who have no opinion, (b) Find the probability that the Strollers will ‘how to calculate aad use V; : of a simple function of X 

the probabilities of answers from a randomly win in 1999. se Var(X), the variance of X 

selected shareholder are as follows: (c) If the Strollers win in 1999, what is the e about the cumulative distribution function F(x) 

: probability that it will be their first win for ea A ; 
hes he ae at least three years? bout the results relating to expectation algebra for random variables X and Y 
Too ‘tiie 0.02 (d) Assuming that the Strollers win in 1999, n 
i find the smallest value of # such that the - This chapter i d 

What is the probability that if three shareholders probability of the Ramblers winning the } r is concerned with discrete vari A ay : 

are selected at random, quiz for 7 consecutive years after 1999 is specify or describe all its possible numeric ane When a variable is discrete, it is possible to 

(a) they will all answer ‘too high’ less than 5%. (MEI) pas etaee al values, for example 

b 1 e number o : 
(b) exactly two will answer ‘too high’, e emales in a group of four students: the possible values are 0, 1, 2, 3, 4 
> 


the amount gained in i where the 2 
> pence, in a game h i i 
: : = a gi th entry fee is 10p and the prizes are 50p 


that exactly two give the same answer? e the number of times : 
you throw a die until a si ‘ : 
(AEB) 5, su, to infinity. ix appears: the possible values are 1, 2, 3, 4, 


(c) exactly two will give the same answer, 


PROBABILITY DISTRIBUTIONS 


A jet a een 2 arses 
probability distribution gives the probability of each possible value of the variabl 
Consider this situation: : 


By mistake, three fault i 
5 y fuses are put into a box containi 

sy: ! into ntaining two good fuses. 
ate on become mixed up and are indistinguishable by sight. aa ie se SLane 

. What is the probability that you take , pera steer 
(a) no faulty fuses, 
(b) one faulty fuse, 
{c) two faulty fuses. 
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It is possible to show the outcomes and probabilities on a tree diagram: 


Probability Outcome 
wee P(E, B) =3x93=0.3 2 faulty fuses 


0.3 1 faulty fuse 
=0.3 1 faulty fuse 


(a) P(no faulty fuses) = 0.1 

(b) Plone faulty fuse) = 0.3 + 0.3 = 0.6 

(c) P(two faulty fuses) = 0.3 

The variable being considered here is ‘the numbe! 
The values that X can take are 0, 1 or 2. 

The probability that there are no faulty fuses, i.e. the probability that the variable X takes the 
value 0, can be written P(X = 0), so P(X = 0) =0.1. 

Similarly P(X = 1) = 0.6 and P(X = 2) = 0.3. 

Sometimes these are written Po = 0.1, p= 0.6, P2 = 0.3. 

When defining variables, the variable is usually denoted by a capital letter (X, Y, R, etc) anda 
particular value that the variable takes by a small letter (x, y, 7, etc), so that P(X =x) means 

he variable X takes the value os 


+ of faulty fuses’ and it can be denoted by X. 


‘the probability that ¢ 
The probability distribution for X can be summarised in a table and illustrated in a vertical 
line graph. puR) 
0.6 
0.5 
x 0 tf 2 0.4 
0.3 
P(X =x) O.4 0.6 0.3 0.2 
ol 
0 
0 qa 2c 
Tf the sum of the probabilities is 4, the variable is said to be random. 
andom variable. 


In this example P(X = 0) + P(X =2)=0.1+ 0.6 + 0.3 =1, so X is a discrete r 


For a discrete random variable, the sum of the probabilities is 1, 


ie. x. P(K=x)= 41 or Lp,=1 for 421,2, 0) # 


The function that is responsible for allocating probabilities, POX = x), is known as the probability 
density function of X, sometimes abbreviated to the p.d.f. of X. The probability density function 
can either list the probabilities individually or summarise them in a formula. 


Example 4.1 


Two tetrahedral dice, thrown and the score noted, 


each with faces labelled 1, 2, 3 and 4 are 


where the score is the sum o: ‘wo number: whic: ice lan nd the proba 

the t mbi lich th d. Find th bab 
h t th f son e di Pp. i y 
density fanction (p.d.£.) of X, where X is the random variable the score when two dice are 


thrown’, 
Solution 4.1 
Th . 
e score for each possible outcome is shown in the possibility space: 
& 4 $ 6 7 8 
z 
8 3 4 5 6 7 
o 
Be : é Z r From the diagram you can see that X 
; : can take the values 2, 3, 4, 5, 6, 7, 8 
2 3 4 5 only. 
1 2 3 4 
First die 


ince each me y y. ri fe . 

S ach outco. Is equall like 9 the Pp obabilities can be found from the dia; ram. 

For example P(X = 5) = % since 4 out of the possib e 16 outcomes result in a score of 5 
> ¢ ) 16 n 


The probability distribution is formed: 


x 2 3 4 5 6 7 5 
PIXE} i 6 % ¢ % 2 1 
16 16 


Notice the pattern for the probabilities relating to x from 2 to 5 
z= 
P(X =x)=—— for x = 2,3,4,5 
For x from 6 to 8, there is a different pattern 
9-x 
PX ex) for x = 6,7,8 
These two formulae give the p.d.f. of X. 


NOTE: ¥° P(X =x)=4 
> y= 761 +24+34+4+3+2+41)=1, confirming that X is a random variable 


ali x 


Example 4.2 


The p.d.f. of a diser 
df. ete random variable Y is gi 
that c is a constant, find the value of 2 parte Aeron geaen aNe 


Solution 4.2 
Tt h i 
Bete 2 write out the id : f 2 2 ! 
ility distribution of Y. PY Sy) 0 
= c 4¢ 96 16¢ 


Since Y is a rand i 
random variable, > P(Y=y)=1, Le. the sum of all the probabilities is 1 


ally 
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So c+4c4+9c+ 16c=1 
30c=1 


Example 4.3 a . 
The discrete random variable W has probability distribution as snown. 

w =3 =2 +1 0 By 

P(W=w) 0.1 0.25 0.3 0.15 d 


Find 


(a) the value of d, 
(d) P(-1< W<1), 


(b) P(-3< W<0) 
(e) the mode. 


Solution 4.3 


(a) Since D) P(W=w)=1 
all w 
0.14+0.25+0.340.15 +d=1 
0.8+d=1 
d=0.2 


b) P(-3 < W<0)=P(W=-3) + P(W=-2) + P(W 
= 0.1 + 0.25 + 0.3 
= 0.65 


c) P(W>-1) = P(W=0) + P(W=1) 
= 0.15 40.2 
= 0.35 


d) P(-1< W<1)=P(W=0) 
= 0.15 


Exercise 4a Probability distributions 


4, The discrete random variable X has the given 
probability distribution. 


% { 3 3 4 5 


02. 025 04 a 0.05 


P(K=x) 


(a) Find the value of a and draw a vertical line 
graph to illustrate the distribution. 
(b) Find (i) P(L<X <3), (ii) POX > Dy 
(ii) P< X <5), (iv) the mode. 


e) The value of w with the highest probability is — 


() P(W>-D) 


=-1) 


1, so the mode = -1. 


ii i i discrete 
2. The probability density function of a 
fandom variable X is given by P(X =x) = kx for 
= 12, 13, 14. : : 
Write out the probability distribution and find 
the value of k. 


3. The discrete random variable X can take values 
3, 5, 6, 8 and 10 only. Given that p3 = 0.1, 
ps = 0.05, pg = 0.45 and ps = 3P 19, calculate Pro 


4. X has probability distribution as shown in the 


table 
x 1 2 3 4 5 
t 3 1 i 
PXsx) SS ee 
(X=) 70 40 a 5 20 


(a) Find the value of a. 
(b) Find P(X> 4). 

{c) Find P(X <1). 

(d) Find P2<X <4). 


5. Write out the probability distribution for each of 
these variables. 


(a) The number of heads, X, obtained when 
two fair coins are tossed. 

(b) The number of tails, X, obtained when three 
fair coins are tossed. 


6. A drawer contains eight brown socks and four 
blue socks. A sock is taken from the drawer at 
random, its colour is noted and it is then 
replaced. This procedure is performed twice 
more. X is the random variable the number of 
brown socks taken. Find the probability 
distribution for X. 


7. The discrete random variable R has p.d.f. 
P(R =r) =c(3 ~-1) forr=0, 1, 2, 3. 
(a) Find the value of the constant c. 
(b) Draw a vertical line graph to illustrate the 
distribution, 
(c) Find P(i< R <3). 


8 Write down the formula for the p.d.f. of X 
where X is the numerical value of a digit chosen 
from a set of random number tables. 


9. A game consists of throwing tennis balls into a 
bucket from a given distance. The probability 
that William will get the tennis ball in the 
bucket is 0.4. A turn consists of three attempts. 


(a) Construct the probability distribution for 
X, the number of tennis balls that land in 
the bucket in a turn. 


EXPECTATION OF X, EQ) 


(b} William wins a prize if, at the end of his 
turn, there are two or more tennis balls in 
the bucket. What is the probability that 
William does not win a prize? 


10. Emma plays a game in which she throws two 
dice. If she gets two sixes, she wins 20p, if she 
gets one six she wins 10p, otherwise she wins 
nothing. She has to pay Sp to enter. 


Write out the probability distribution of X, the 
amount Emma gains in one turn. 


11. A student has a fair coin and two six-sided dice, 
one of which is white and the other blue. The 
student tosses the coin and then rolls both dice. 
Let X be a random variable such that if the coin 
falis heads, X is the sum of the scores on the two 
dice, otherwise X is the score on the white die 
only. 


Find the probability function of X in the form of 
a table of possible values of X and their 
associated probabilities. 


Find P(3 < X <7), 


State the assumption you made to enable you to 
evaluate the probability function. (AEB) 


12. X can take values 5, 6, 7, 8 and 9. The vertical 
line graph to illustrate the distribution of X is 
incomplete. Given that P(X = 8) = 2P(X = 9), 
complete the line graph and describe the 
distribution. 


E(X) is read as ‘E of X’ and it gives an average or typical value of X, known as the expected 
value or expectation of X. This is comparable with the mean in descriptive statistics. 


Experimental approach 


The frequency distribution shows the results when an unbiased die is thrown 120 times. 


Score, x 1 2 3 4 ay 6 
Frequency, f- is 22 23 19. 23 18 Fotal 120 
= SL Np LG IIL = STE 
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1x15 +2 22 + 3x23 4x19 5x23 6x18 
The mean score, = Le = is ie is ao i aaa ae =3.6(2s8.£) 


You could write this out ina different way 
43xB+ 4x2 + 5x d5 + 6x oh 


R= AS. 22. 
xelxdo + 2% 120 120 
The fractions by Be 2%, Be be {are the relative frequencies of the scores of 1, 2, 3,4, 5,6 


respectively. 


Notice that they are close to 25% 
]f you throw the die a large number of times, you would expect each of these fractions to be 
a particular score on the die. 


closer to 4, the limiting value of the relative frequency 0 


Theoretical approach 


the probability of obtaining a particular value is }- 


When an unbiased die is thrown, 
x)=} forx=1,2,3,4,5,6 


The probability distribution is P(X = 


Score, x 


P(X =x) 


The expected mean, or expectation of X, is obtained by multiplying each score by its 


probability, then summing. It is written E(X), $0: 


Expected mean, E(X) = 1*¢ 4 axba axe rt 4xh + SxG + 6xt 
5 


The expectation or expected mean can be though 
experiments increases indefinitely. 


Ina statistical experiment 


@ apractical approach results in a frequency distribution and a mean value, 
@ a theoretical approach results ina probability distribution and an expected value, known 


qs the expectation. 
The expectation of X (expected value or mean), written E(X), is given by 


E(X) = 9 xP(X =) 


all» 
"This can also be written 
B(M) = Lx; j= 1,2, 7 
The symbol 4, pronounced émew’ is often used for the expectation, where 


w= BOX) 


t of as the average value when the number of 


Example 4.4 
A rand i ility di 
andom variable X has probability distribution as shown. Find the expectati E(X) 
: ion, ; 
x =2: Sf 0 1 2 
P(X =x) 0:3 0.1 0.15 0.4 0.05. 


Solution 4.4 


E(X)= >) xP(X =x) 


allx 
= (~2) x 0.3 + (- 
ae + (-1)x 0.140% 0.15 +1%0.4+2 x 0.05 


Example 4.5 


Find t i 
he expected number of sixes when three fair dice are thrown. 


Solution 4.5 


is the number of sixes and can take values V, 1, 2, 3. Usin} e notation 6 to represen! ie 
Xist f sixes and can tak it 0, 1, 2, 3. Us th tat 6t t th 
8 Pp. 


P(X =0) = PG, 6, 6) =(@) = 21 

P(X = 1) = P(6, 6,6) + P(6, 6, 6) + PI 
=P, 6, 65 6, 6, 6) 
“PD + Ob + Gxt 
= 7216 

P(X =2) =3 x P(6, 6,6) =3 x Ex (6) = zi 


216 
P(X =3) = P(6, 6,6) =()° =a 


The probability distribution for X is 


ba 0 1 > "3 
P(X = 125 
(Kaa) BR Be leo ae 


E(X) =2.xP(X =x) 
= Ox S41 x 42x Abt 3x a4 
=0.5 
és as 
he expected number of sixes when three dice are thrown is 0.5 


NOTE: in 50 throws you would expect 25 sixes. In practice you may not get 25 si 


5000 throws thou: 
gh, you may get ve! ‘ 
lobe sevrcaserase vale y g ry close to 2500 sixes. The expected value gives you the 


es 
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Symmetrical probability distributions 


ty which some distributions possess is that of symmetry, for example 


An important proper’ 


a 
(a) x : 1 2 3 ‘a s 
: = 04 
0.1 


P(X=x) O01 0.2 0.4 0.2 


It can be seen from the table or from the vertical line graph 
that the distribution is symmetrical about the central value 0.2 
X = 3, 90 E(X) =3. 
0.1 
Check: E(X) = > xP(X=%) 
: 0 i123 4 5* 


all 
10142 0.243K0444x 0.245% 0.1=3 


(b) If X is the random variable ‘the digit picked from a random number table’, then the p.d.f. 


of X is P(X =x) =0.1 for x = 0, 1, «++ 9+ 


0 
012345 6 7 89x 
the central value mi 


The distribution is symmetrical about d-way between 4 and 5 so 


E(X) =4.5. 


NOTE: the random variable X with p.d.f. P(X =x) 
a constant, is said to follow a discrete uniform distribution. 


= k, for all possible values of x, where k is 


Example 4.6 


e independently. Each window shows 


nsists of three windows which operat 
he probability that a window shows @ 


A fruit machine co 
cherries or bananas. T’ 


pictures of fruits: lemons, apples, 
particular fruit is as follows. 


P(lemon) = 0.4 Lo» P(cherries) = 0.2 


P(banana) = 0.3 


P(apple) = 0.1 
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The rules for playing a game on the fruit machine are: 


You win #1 


You win 5Op 


You win 40p 


You win 80p 


Find the expected gain or loss if you play a game. 


Solution 4.6 


he variable X is ‘the amount gi ined, in Nn na gam 

T ; 1 ained, In pence i 4 

| > game. 

Ta KIng, into account the cost of 10p to play, X can take the values 90, 70. 40 30, -10 
Aas ake is) ¥ 


P(X =90) =P(3 apples) =0.1 x 0.1 x 0.1 = 0.001 


P(X =70) =P(2 apples and one wi ies, i 
ith cherries, in any order) 
=P(A, A, C) + P(A, C, A) + PIC, A, : 
=3x0.1?x 0.2 ter aga 
= 0.006 


P(X =40) =P(3 cherries) = (0.2)? = 0,008 
P(X = 30) =P(3 lemons) = (0.4)? = 0.064 


P(X =-10) = P(you win none of these prizes) 
=1- (0.001 + 0.006 + 0.008 + 0.064) = 0.921 


The probability distribution for X is 


x 90: 70. 40 30 =10 
P(X =x) 0.001 0.006 0.008 0.064 0.921 


E(X) = D0 xP(X =x) 


allx 


= x 0.001 + 70 x 0.006 + 40 x 0.008 + 30 x 0.064 + (-10 0.921 
90 x 0.0 . 008 + 3: . 
=~6.46 : 


The expected loss per turn is 6.46p. 


This means that, i 
if you pl. : 
lose £64.60! , if you played the game, say 1000 times, on the average you could expect to 


Example 4.7 


A newsagent stocks 12 copies of a mag: 
and the number of additional copies so 


has regular orders for nine copies, 
o week. The newsagent uses 
ble total number of copies sold, 


azine each week. He 
1d varies from week t 


previous sales data to estimate the probability, for each possi 


as follows. 


Number of copies 9 10 ti 12 


0:20 0.35 0.30 0.15 


Probability. 


t he sells in a week. 


mber of copies tha 
"AS each. Any copies 


(a) Calculate an estimate of the mean nui 
85p each and sells them at £1 


(b) The newsagent buys the magazines at 
not sold are destroyed. 


a week when he sells 11 copies. 


(i) Find the profit on these magazines in 
ble for the newsagent’s weekly profit from the 


(ii) Construct a probability distribution ta 
sale of these magazines. Hence, or otherwise, calculate an estimate of his mean 
(NEAB) 


weekly profit. 


Solution 4.7 
(a) Let X be the number of copies he sells in 


E(X) =D xP(X= x) 
=9x0,20+ 10x 0.35 +11 x 0.30 + 12 x 0.15 


= 10.4 
An estimate of 


a week. 


the mean number sold in a week is 10.4. 
(b) (i) When he sells 11 copies, profit = 11 x £1.45 - 12 x £0.85 = £5.75 
(ii) When he sells 9 copies, profit= 9 x £1.45 - 12 x £0.85 = £2.85 
When he sells 10 copies, profit = 10 x £1.45 - 12 x £0.85 = £4.30 
When he sells 12 copies, profit = 12 x £1.45 — 12 x £0.85 = £7.20 
Let £Y be the weekly profit. The probability distribution of Y is 


2.85 4.30 5.75 7:20 


y 
P(Y=4) 0.20 0.35 0:30 0.15 
E(Y) =2.85 x 0.2 + 4.30 x 0.35 + 5.75 x 0.30 + 7.20 x 0.15 
=4.88 
An estimate of his mean weekly profit is £4.88. 


Example 4.8 


In a game, 
otherwise you 
nine games? 


n any of the dice, you win £1, 


you get a one or a six 0 
to win or lose when you play 


three dice are thrown. If 
bb would you expect 


have to pay £5. How muc 


You are now given the opportunity to change the rule for the amount you win when a one or 
a six appears. 
To make the game worthwhile to 


suggest? 


yourself, what is the minimum amount that you would 


Solution 4.8 
When a die is thrown, 
Pi or 6)=2=4 so P(neither 1 nor 6)=1-4=3 
When three dice are thrown, a 
P(neither 1 nor 6 on all three) = 4G)? = $ 
so P(1 or 6 turns up)=1-$=%8 


Let £X be the amount won in a game. 


The probability distribution of X is 


x 25 1 


P(X =x) & 2 


So E(X) =). xP(X=x) 


all x 


8 19 
=(-5) x pics 
( xogt1xz5 
bh 

9 


The negative am indi 
a ount indicates that you would expect to make a loss of £3 per gai 
ter nine games, expected loss = £3 x 9 = £7 ies 


In making the game w : 
th : 
foe & orthwhile, you would obviously want to ensure that you didn’t make a 


Change the rule for paym 
€ 
Pantanal payment to £y when you get 1 or 6 on any of the dice. The probability 


You want E(X)>0 
ie, ~40+ 19 +0 
27 
40+ 19y >0 
19y > 40 
y >2.105 ... 


© minimum amount you should s' iggest that you win if you get a one oF a SIX Is £2.11. To 
make the game worthwhile, perhaps suggest £2.50. 


Pe 


Exercise 4b Expectation 
1. 


The probability distribution for the random 
variable X is shown in the table: 


x. 


0 1 2 3 4 


P(X =x) 


ae 


r 
3 


an 


ae & 
2 4 


2. 


Find E(X). 


The random variable X has pdf. P(X =x) for 
x= 5,6, 7, 8,9 as defined in the table: 


i 5 6 7 8 9 

Pee) Fo a ae it 
Find yu. 

3, The probability distribution of a random 

variable X is as shown in the table: 


10. 


Find (a) the value of y; (b) E(X}. 


Find the expected number of heads when two 
fair coins are tossed. 


A bag contains five black counters and six red 
counters. Two counters are drawn, one at a time, 
and not replaced. Let X be ‘the number of red 
counters drawn’. Find E(X). 


‘An unbiased tetrahedral die has faces marked 1, 
2, 3, 4. If the die lands on the face marked 1, the 
player has to pay 10p. 

Tf it lands lands on a face marked with a2 or a4, 
the players wins Sp and if it lands on a 3, the 
player wins 3p. Find the expected gain in one 
throw. 


‘A discrete random variable X can take values 10 
and 20 only. If E(X) = 16, write out the 
probability distribution of X. 


‘The discrete random variable X can take values 
0, 1, 2 and 3 only. Given P(X < 2) =0.9, 

P(X <1)=0.5 and E(X) = 1.4, find 

(a) P(X =1), (b) P(X=0). 


x. 0. 4 2 3 


P(X =x) c 3 cee 8. Bch 4 Ze 


The above table shows the probability 
distribution for a random variable X. 


Calculate (a) ¢,  (b) E(X). (L Additional 


A bag contains three red balls and one blue ball. 
A second bag contains one red ball and one blue 
ball. A ball is picked out of each bag and is then 
placed in the other bag. What is the expected 
number of red balls in the first bag? 


A. 


13. 


14. 


45. 


Jn a game, a player rolls two balls down an 
inclined plane so that each ball finally settles in 
one of five slots and scores the number of points 
allotted to that slot as shown in the diagram 
below: 


(3 4 7 4 2 


It is possible for both balls to settle in one stot 
and it may be assumed that each slot is equally 
likely to accept either ball. 

‘The player’s score is the sum of the points scored 
by each ball. 

Draw up a table showing all the possible scores 
and the probability of each. 

If the player pays 10p for each game and receives 
back a number of pence equal to his score, 
calculate the player’s expected gain or loss per 50 
games. (C Additional) 


. Ina game a player tosses three fair coins. He wins 


£10 if three heads occur, £x if two heads occur, 
£3 if one head occurs and £2 if no heads occur. 
Express in terms of x his expected gain from 
each game. 

Given that he pays £4.50 to play each game, 
calculate 


(a) the value of x for which the game is fair, 
(b)_ his expected gain or loss over 100 games if 
x= 4.90. (Cc Additional) 


In an examination a candidate is given the four 
answers to four questions but is not told which 
answer applies to which question. He is asked to 
write down each of the four answers next to its 
appropriate question. 

(a) Calculate in how many different ways he 
could write down the four answers. 

(b) Explain why itis impossible for him to have 
just three answers in the correct places and 
show that there are six ways of getting just 
two answers in the correct places. 

(c) Ifa candidate guesses at random where the 
four answers are to go and X is the number 
of correct guesses he makes, draw up the 
probability distribution for X in tabular form. 

(d) Calculate E(X). (L Additional) 


The discrete random variable X has p.d.f. 
P(X =x) = kx for x= 4,2, 3,4, 5 where kis 
constant. Find E(X)}. 


‘A woman has three keys on a ring, just one of 
which opens the front door. As she approaches 
the front door she selects one key after another 
at random without replacement. Draw a tree 
diagram to illustrate the various selections 
before she finds the correct key. Use this diagra™ 
to calculate the expected number of keys that 
she will use before opening the door. 

(L Additional) 


THE EXPECTATION OF ANY FUNCTION OF X, E(g(X)) 


The definition of expectation can be extended to any function of X, 
> 


such as 10X, X?, 5 X-A, etc. 


In general, if g Sany function ne discrete rand aria 
Li g(X) is any function of the dis andom variable X, tt 
f t A, ther 


E(g(X)) = D> g(x) x P(X = x) 


all 


For example, 


E(10X) == 10xP(X = x) 
E(X?) =D x*P(X=x) 


1 1 
Hy = x= P(X =x) 
E(X - 4) =X(x -4)P(X =x) 


Example 4.9 


The random variable X has p.d.f. P(X =x) for x = 1, 2, 3 as shown 
? . * 


Calculate 


(a) E(X), (b) E(3), (c) E(SX), 


Solution 4.9 


(a) E(X) = 2 xP(X =x) 
=1x0.14+2x0.6+3x0.3 
=2.2 

(b) E(3) =23P(X =x) 


=3x0.14+3x0.64+3x0,3 
=3 


Noti 
otice that the expected value of a constant is equal to the constant. 


(c) E(SX) == 5xP(X =x) 
=5x0.1+ 10x 0.6 +15 x 0.3 
=11 

Notice that SE(X) = 5 x 2.2=11 
so E(5X) = 5E(X). 
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(d) E(SX + 3) = (Sx + 3)P(X =>) 
=8x 0.4413 x 0.6 + 18 x 0.3 
=14 
Notice that E(SX) + E(3)=11+3=14 
so E(SX +3) = E(5X) + E(3) 


ie. E(SX +3) =SE(X) +3 
_—_—_——_—— 


In general, for constants 4 and b, 


Example 4.10 


A six-sided die has faces marked with 
probability of obtaining the number R i 


(a) Show that the probability distribution of R is given by 


, 9 and 11. It is biased so that the 


the numbers 1, 3, 5, 7 
die is proportional to R. 


na single roll of the 


PR=N=30> 713,557,911. 


d and a rectangle drawn with sides of lengths 6 cm and R cm. 


(b) The die is to be rolle 
Calculate the expected value of the area of the rectangle. 

(c) The die is to be rolled again and a square drawn with sides of length 24R-! cm. 
Calculate the expected value of the perimeter of the square. 


Solution 4.10 
(a) r 1 3 5 4 9 14 
P(R=?) k 3k 5k 7k 9k 11k 
YP(R=r)=1 k+ 3k 5k+7R+9R+ 1ik=1 
36k=1 
1 
poe 
36 
The distribution is 
r 1 3 5 7 9 it 
P(R=1) % % 6 % % % 


PiRer)a2e forr=1,3,5,7,9,11 


(b) A=6R 
 E(A) = E(6R) 


6 
= 6E(R) 


| () Pa4x24Rt22% 
R 


E(R) =ErP(R=1) 
H=Lx dt 3x Xt Sx ft7x Zt 9x Zt U1xH¥ 
=7% 

E(A) =6 x 7% = 473 

The expected value of the area is 473 cm”. 


9 24 Ro 


(NEAB) 


+ E(P)= (| 
R 


= sse( | 
| R 
| 1 1 
(| =Z—P(R=r) 


r 


a eae | 1 3 1 5 1 
=oX>> + Xs + Sx 7 1.9 1 11 
1 36 3 Sy x + x + 
1 de S86 ae se ad 


6 
BP) =96 x= =16 


The expected value of the perimeter is 16 cm. 


Example 4.11 
X is the number of heads obtained when two coins are tossed. Find 


(a) the expected number of 
(b) E(X?), ete eee 
(c) E(X?~ X). 


Solution 4.11 


P(X =0)=P(T,T)=}*3=4 
P(X = 1) = PCT, H) + P(H,T) 
P(X =2)= P(H,H)=3x5= 


The probability distribution for X is 


=lydgiydei 
aX gt+qXq=9 


Bay 


P(X =x) i f 1 
4 


(a) E(X)=1 (by symmetry) 

(b) E(X?) = 2 x?P(X =x) 
=02x14742x442?x1 
ge x2 420% 4 
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(c) E(X?—X) = E(x? —x)P(X= 2) 
=Oxb+Ox542%4 
1 


Notice that E(X2)- E(X) = 12-4 
so E(X? — X) = E(X?) ~ E(X) 


=}$and E(X?- X) = 


In general, for two functions of X, g(x) and h(x) 
Elg(X) + bX) = Blg(X)) + EUs(X) 
For example 
abs + x" E(X?) + a5 x E(3X — 4X2) = 3E(X) ~ 4B") 


Example 4.12 . fee 
The discrete random variable X has the following probability distribution. 
x 0 1 2 3 4 


0.20 0.20 


(a) Write down the name of the distribution of X. 
(b) Find P(O<X< 2. 
(c) Find E(X). 

(d) Find E( X? + 3X). 


Solution 4.12 
(a) It is a discrete uniform distribution. 
(b) P PIOKX<2)= P(X = 0)+P(X= 1)=0.24+0.2= 0.4 
(c) By symmetry, E(X) =2 
(d) E(x? 4+ 3X) = E(X 2) i 3E(X) 
Dx? P(X =x 
a seen tOt e124 22432447 
=6 
so B(X? + 3X) = 6 + 3E(X) = 6 +6 = 12 


VARIANCE OF X, VAROQO OR VX) 


Remember that variance = (standard deviation)’. 


Experimental approach 
For a frequency distribution with mean %, t 
5 Efe” 
ses 
xf 
This can also be written 
Bo Dp Re. 
role 


he variance s? is given by 


s 


Theoretical approach 


For a discrete random variable X, with E(X) =n, the variance is defined as follows: 


The variance of X written Var(X) is given by 
V E(X ~ 4}? 
Alternatively, Var(X) = E(X —y)? 
= E(X?— 2uX +p’) 
= E(X?) - 24E(X) + EW’) 
= E(X?) ~ 2y? +0? 
= E(X?) — ye? 


Var(X) = 


“we 
This format is usually easier to work with. 


NOTE: p=E(X) sop? =(E(X))? 


This is very cumbersome to write, so it is often written E>(X). This is similar to the notation 


used in trigonometry where (cos A)? is written cos” A. 
You could write Var(X) = BUX?) ~ E4(X) 
Var(X) is sometimes written as 0” (a is pronounced ‘sigma’). 


o= VVar(X) = standard deviation of X 


Example 4.13 
The random variable X has probability distribution as shown in the table: 
x 1 2 3 4 5. 
P(X =x) 0.1 0.3 0.2 0:3. 0.1 
Find 
{a) w= E(X), 
(b) E(X*), 


(c) Var(X), 
(d) o, the standard deviation of X. 


Solution 4.13 


(a) By symmetry, # = E(X) =3 


(b) E(X?) = Lx? P(X =x) 
=1x014+4x0.34+9x0.2+16 «0.3425 x01 


=10.4 
(c) Var(X) = E(X2) — 2 
=10.4-9 
=14 
(d) o= ie 


ards labelled 1, 3 and 5; the 


Example 4.14 
. The first box contains c 
Mee layer draws one card at random 


Two boxes each contain thi 
second box contains cards labelled 2, 6 and 8. In a game, a p seeaiey 
from each box and his score, X, is the sum of the numbers on the tw : 


£.X and find the corresponding probabilities. 


(a) Obtain the six possible values 0: ‘ " agate 
(b) Calculate E(X), E(X’) and the variance of X. 
Solution 4.14 —_ 
(a) Possibility space Probability distribution 
a 
Second box : : ; : ART RCEE 
Tent ee 


First box 


(b) E(X) = ZxP(X =x) : ( 
a3xbrSxhe7xZrOxge bx5t oxy 


= 8} 


E(X?) = Lx? P(X =x) ; ; 
29x b425 x $449 x54 81 x54 121 x5 +169 X5 


= 783 
Var(X) = E(X?) - E7(X) 


The following results relating to variance are useful. 


> any constants, 


For example 


Var(2X) = 2? Var(X) 


=4 Var(X) 
Var(2X + 3) =2” Var(X) 
=4 Var(X) 
Var(5 —X)=(-1)? Var(X) 
= Var(X) 


Exercise 4c 


1. The discrete random variable X has p.d.f. 
P(X =x) for x = 1,2, 3. 


Expectation and variance 
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7. The random variable X has p.d.f. P(X =x) as 
shown in the table: 


x 1 2 3 


x +2. -1 0 1 c 


P(X =x) 0.2 0.3 0.5 


P(X =x) 0.1 0.1 0.3 0.4 0.1 


Find (a) E(X), (b) E(X?) (c) Var(X). 


2. The discrete random variable X has the 
probability distribution specified in the following 
table. 


x a: 0 1 2 


P(X =x) 0.2.5, 0.10 0.45 0.20 


(a) Find P(-1<X <1). 
(b) Find E(2X + 3). 


3. The discrete random variable X has p.d.f. 
P(X =0) =0.05, P(X =1)=0.45 
P(X =2) =0.5, Find 
(a) w= E(X),  (b) E(X?), (c) E(SX? +2X - 3). 


4, The discrete random variable X has p.d.f. 
P(X =x) =k for x =1, 2, 3,4, 5, 6. Find 
(a) E(X),  (b) E(X?), 

(c) E(3X+4), (d) Var(X). 


5. The random variable X takes values 2, 4, 6, 8, 
and its probability distribution is represented in 
the vertical line graph. 


Find Var(X). 


6. A roulette wheel is divided into six sectors of 
unequal area, marked with the numbers 1, 2, 3, 
4, 5, and 6. The wheel is spun and X is the 
random yariable ‘the number on which the wheel 
stops’. The probability distribution of X is as 
follows: 


x Peien Qn Bnd, So 6 
P(X Sa) ote te bee te ie 


x 4 6 8 10 12 


10. Find Var(X) for each of the following probability 


11. X is the random variable ‘the number on a 


Find the value of c (a) if E(X) = 0.3, (b) if 
E(X?) = 1.8, 


8. The discrete random variable X has probability 
function given by 


G)* x= 1,2,3,4,5, 
Pix) fe x= 6, 

0 otherwise, 
where ¢ is a constant. 


Determine the value of c and hence the mode and 
mean of X. (L) 


9. A game consists of tossing four unbiased coins 
simultaneously. The total score is calculated by 
giving three points for each head and one point 
for each tail. The random variable X represents 
the total score. 


(a) Show that P(X = 8) =}. 
{b) Copy and complete the table, given below, 


for the symmetrical probability distribution 
of X. 


P(X =x) Fa 


(c) Calculate the variance of X. (NEAB 


distributions: 


(a) 


x PB. de 0. 2 3 


P(X =x) 003% 0.3 0.2041 O14 


(b) 


x basgees 79 
PIX Sag ee gs ag 
) [x 0 Boers 6 


P(X=x) 0.11) 0.35... 0.46... 0.08 


biased die’, and the p.d.f. of X is as shown, 


Calculate (a) E(X),  (b) E(X), (c) EBX - 5), 
(d) E(6X?), (e) Var(X). 


x. Lie Qe goa ae 6 
Porape EE OSES 


ave 


Find (a) the value of y,  (b) E(X),  (c) E(X?), 
(d) Var(X}, (ec) Varl4X). 


12. A team of three is to be chosen from four boys 20. {a) Ina gamea player pays £5 to toss three fair 


THE CUMULATIVE DISTRIBUTION FUNCTION, F(x) 


and five girls. If X is the random variable ‘the coins. Depending on the number of tails he 
number of girls in the team’, find (a) E(X), obtains he receives a sum of money as Tia edie Sictinsat : 
(b) E(X?),  (c} Var(X). shown in the table below. uency custribution, the cumulative frequencies ar i ‘ 
A fs Number of tail: 3 2 HI 0: i frequencies up toa particular value. - Sc Oe qiee rae all she | 
13. Two discs are drawn without replacement from a umber of fails ! In the same way, i bability d | 
ini ite di i in a proba istributi sian 
bad Cas en oe a epranes Sum received £10 £6 £3 i i summed to give 5 eee "The se ann ‘ibe Paehamaean | 
discs drawn’, construct a probability distribution layer’ cted gai loss | i A “is ies ulative probability function is written F(x). 
dises draw, coma ge), fe) Vat Calelate the ? ayer’s expected gain or loss Consider the following probability distribution (x) 
(d) Var(3X ~ 4). (b) A variable X has a probability distribution | x 1 2 3 7 
hown in the table below. 
14, If.X is the random variable ‘the sum of the scores shown in the table be'ow : P(X = x) 0 _ 
on two tetrahedral dice’, where the score is the Value of X 10 20 50. 100 05 0.4 03 0.15 0.4: 
number on which the die lands, find (a) E(), - F(1) = P(X < 1) = 0.05 
(b) Var(X),  (c} Var(2X), (d) Var(2X + 3). Probability 0:5 0:3 Pp qd F(2) = P(X <2) = P(X = 1) + P(X =2) = 0.05 +0.4=0.45 
15. The discrete random variable X has probability Given that X can only take the values 1, 2, 5 \ F(3) = P(X < 3) = 0.75 , : 
istribution as shown in the table. or 10, and that E(X) = 25, calculate F(4) = P(X <4) =0.9 
Find Var(2X + 3). (i) the value of p and of q. F(5) = P(X <5)=1 
(ii) the variance of X. | Notice th 7 
bd 10 20 30 In a fairground game, a player rolls discs on oi need - F(5) give the total probability. | 
5 to a board containing squares, each of e cumulative distribution function i 
P(X= x) Out 0.6 9.3 which bears one of the numbers, 10, 20, 50 ons 
or 100. Ifa disc falls entirely within a x 1 
16. Two discs are drawn, without replacement, from square, the player receives the same number Be a Te ag ee 
a box containing three red discs and four white of pence as the number in the square; if it 0.05 0.45 
discs. The discs are drawn at random. If X is the does not, the player does not receive ia ben eral foithe-di ; = 0:25, 
random variable ‘the number of red discs anything. The probability that a player will DASE, DOE TOE CSCHELC random variable X, | 
drawn’, find (a) the expected number of red receive money from any given roll is 4. Ifa ¢ cumulative distribution function is given by F(x) where 
discs, (b) the standard deviation of X. player does receive money, the probabilities Fix) = PIX : es : : 
of receiving 10p, 20p, SOp or £1 are the ‘ x 
17. Ten identically shaped discs are in a bag; two of same as those connected with the values of S : . 
them are black, the rest white. Discs are drawn X above. How many discs should a player ometimes F(x) can be given by a formula as in the following example 
at random from the bag in turn and not replaced. be allowed to roll for SOp, if the game is to : 
Let X be the number of discs drawn up to and be fair? (C Additional) Example 4.15 


including the first black one. 


List the values of X and the associated 
theoretical probabilities. 

Calculate the mean value of X and its standard 
deviation. What is the most likely value of X? 


21. {a) Aman takes part ina game in which he 
throws two fair dice and scores the sum of 
two numbers shown. The rewards for the 
scores are given in the following table. 


The di i i 
e discrete random variable X has cumulative distribution function F(x) = a for 


x=1,2,..., 6. Write out the probability distribution and suggest what X represents. 


If, instead, each disc is replaced before the next is Score 42 10 7. 5 other i Solution 4.15 
drawn, construct a similar list of values and ; 
pont out the chief differences between the two Reward (£) 16 6 3 Ss 0 The cumulative distribution function is 
ists. 
Calculate the expected reward for a throw i 1 3 
18. The discrete random variable X has p.d.f. b) Fone two tee cat ' 3 4 5 6 
S - ag, contains five i jentical Giscs, two oO) F 1 
P(X=2) = klx| which are marked with the letter A and . (x) 6 g 3 $ § 1 
where x takes the values ~3, -2, -1, 0, 1, 2, 3. three with the letter B. The discs are ou can find the probabili Fern Meer 
Find (a) the value of the constant k, randomly drawn, one at a time without P(X=1 ‘i P ability distribution from the table. 
(b) E(X), replacement, until both discs marked A are | =D=¢% 
(c) E(X?) obtained. Show that the probability chat P(X =2) = F(Q2)- F(A) =2-4=1 
(d) the standard deviation of X. three draws are required is * i P(X =3) = F(3) — F(2)= 3_2 vid: a 
Given that X denotes the number of draws i The probability distri 656. 6- ANG: 5O;0n 
49, The random variable X takes integer values only required to obtain both discs marked A, Fi uity istribution is 
and has p.d.f. copy and complete the following table. 
P(X =x) = kx x=1,2,3,4,5 5 : is 1 2 3 4 5 é 
P(X =x) =A(10 -) 2 =6,7,8,9 Value of X tose i aa ; ; : 
Find Probability of X +5 i This is th 6 6 é 4 i 4 
(a) the value of the constant k, {b) E(X), is is the uniform distribution, P(X =x)=4,x=1 
i (2X — Evaluate (i) E(X), (ii) E(X?) x ret =%x=1,2,...,6. 
{c) Vax(X),  (d) B(2X ~ 3), (e) Var(2X —3). fee el: AX). {i aaaitiond fal OMe thE score when a die is thrown. : 


254 : 


Example 4.16 | . . . 
For a discrete random variable X the cumulative distribution function F(x) is as shown: 
x. 1 2 3 4 3 
tT: 
F(x) 0.2 0.32. 0.67 0.9 


Find (a) P(X =3),  (b) P(X > 2). 


Solution 4.16 
(a) From the table, 
= = 3) =0.67 
F3 = P(X <3) =P(X = 1) + P(X= 2) + P(X 3)=0. 
Poe 2) 2 pace 1) + P(X =2) = 0.32 
P(X = 3) = F(3) - Fl2) 
= 0.67 - 0.32 = 0.35 
(b) P(X > 2) =1-P(X<2) 
=1-F(2) 
=1-0.32 
= 0.68 


Example 4.17 1. . 
The cumulative probabilities for a random variable X are given in the following table, where 
X takes the values 0, 1, 2, -.., 10. 


Bat oe 
x. Fx) 
0 0.0388 Use the table to find 
1 0:1756 (a) P(X <5), 
2 0.4049 (b) P(X > 3), 
3 0.6477 (c) P< X <7); 
4 0.8298 (d) P(X =7), 
5 0.9327 (e) P(X? 8). 
6 0.9781 
7 0.9941 
8 0.9987 
9 0.9998 
10. 1 


Solution 4.17 
(a) P(X <5) =F(5) = 0.9327 
(b) P(X > 3)=1-P(X< 3) =1-0.6477 = 0.3523 
{c) P(B<X< 7) = K7) - FQ2) = 0.9941 — 0.4049 = 0.5892 
(d) P(X =7)= P(X <7)- P(X <6)= 0.9941 — 0.9781 = 0.016 
(ce) P(X 8)=1-P(X<7)=1-K7)= 40.9941 = 0.0059 


scent AA ee 


Exercise 4d Cumulative cistribution function 


1. The probability distribution for the random 6. Fora discrete random variable X the cumulative 
variable Y is shown in the table: distribution function is given by F(x) = kx, 

x = 1, 2, 3. Find (a) the value of the constant k, 

{b) P(X <3), 

(c) the probability distribution of X, 

(d) the standard deviation of X. 


y O42 0.2 203 0.4.0. 0.5 
P(Y=y) 0.05: 0.25°°° 0.3. 0.15. 0.25 


Construct the cumulative distribution table. 7. The discrete random variable X has distribution 


function F(x) where 


2. Fora discrete random variable R the cumulative FQ@)-1-(-4)" forx=1,2,3,4 


distribution function F(r} is as shown in the 


table: (a) Show that F(3) = $3 and F(2) =}. 
(b) Obtain the probability distribution of X. 
r. 1 2 3 4 (c) Find E(X) and Var(X}. 
(d) Bind P(X > E(X)). 
F(r} 0.13 0.54 0.75. 1 
8. The cumulative probabilities for X are given in 
Find (a) P(R=2), {b)P(R>1), (c) P(R>3), the following table, where X takes the values 
(d) P(R <2), {e) E(R). OSS 2.12; 
3. Construct the cumulative distribution tables for x F(x) 
the following discrete random variables: 0 0.0145 Use the table to 
(a) the number of sixes obtained when two 1 0.0692 find 
ordinary dice are thrown, 2 0.2064 
(b) the smaller number when two ordinary dice : (a) P(X <8), 
are thrown, 3 0.4114 (b) P(X=5), 
(c) the number of heads when three fair coins 4 0.6296 
(c) P(X 4), 
are tossed, 5 0.8042 
6 0.9133 (d) P(3<X<7), 
4, For the discrete random variable X the 7 0.9679 (e) PUK X <9), 
cumulative distribution function F(x) is as 8 0.9900 
hown: : 
uaa 9 0.9974 
x 3 4 5 6 7 10 0.9994 
11 0.9999 
F(x) 0.01 0.23 0.64 0.86 1 12 rr 


Construct the probability distribution of X, and 
find Var(X)}. 


5. For a discrete random variable X the cumulative 
distribution function is given by 


we 
F(x) =o for x = 1,2, 3. 


(a) Find F(2). 

(b) Find P(X = 2). 

(c) Write out the probability distribution of X. 
(d) Find E(2X — 3). 


TWO INDEPENDENT RANDOM VARIABLES 


ff X and Y are any two random variables, then 
BUX + Y) = BX) + BY 


J£X and ¥ are independent random variables, then 


VaclX¥ + ¥) = VarlX) + VarlY) 
To illustrate this, consider two independent random variables X and Y. 
x | 0 1 4 y | 1 2 3 
P(Y=y) | 0.3 0.2 0.5 


P(X =x) | 0.1 0.5 0.4 

E(X)  =ZxP(X =x) E(Y) =ZyP(Y=y) 
=0x0.1+1x0.5+2%0.4 =1x0.342x0.2+3x0.5 
=13 = 2.2 

E(X2) =Lx*P(X =x) E(Y*) =Zy?P(Y=y) 
= 02% 0.1412 x 0.5 +27 x 0.4 
aed = 5.6 

Var(X) = E(X?) - E?(X) Var(Y) = E(Y?) - E7(Y) 


=2.1-1.3? =5,6-2.2? 
=0.41 = 0.76 
Notice that 
B(X) + E(Y) = 1.3 +2.2=3.5 oD 
Var(X) + Var(Y) = 0.41 + 0.76 = 1.17 ® 


Now consider the distribution X + Y where X + Y can take the values 1, 2, 3, 4, 5. 
For example, 


P(X + ¥=4) =P(X=1 and Y= 3) + P(X=2 and ¥=2) 


=0.5 x 0.5 4+ 0.4 x 0.2 
= 0.33 
A tree diagram shows all the outcomes: 
X+Y Probability 
4 0.1 x 0.3 = 0.03 
2 0.1 x 0.2 = 0.02 
3 0.1 x 0.5 = 0.05 
1 2 0.5 x 0.3 = 0.15 
1 2 é) 0.5 x0.2=0.1 
7 3 4 0.5 x 0.5 = 0.25 
: 3 0.4 x 0.3 = 0.12 
4 0.4 x 0.2 = 0.08 
5 0.4 x 0.5 = 0.2 


= 12x 0.3 +22 0.2 +37 x05 


The probability distribution for X + Y is 


EY i 2 3 4 
Probability 0.03 0.17 0.27 0:33: 


0.2 


E(X + Y) =1x 0.03 42% 0.17 43x 0.2744 x 0.33 45x02 
=35 


=E(X)+E(Y)from@® So E(X+Y)=E(X) + E(Y) 
To find the variance, consider first E((X + Y)?) 
E((X + Y)?) =1x 0.03 +4 x 0.1749 x 0.274 16 x 0.33 +25 x 0.2 
= 13.42 
Var(X+Y) = 13.42 - 3.5? 
21317 
= Var(X) + Var(Y) from ® So Var(X + Y) = Var(X) + Var(Y) 


If you perform similar calculations to find the expectation and variance of X - 


that 


E(X - Y) = E(X) - E(Y) but Var(X — Y) = Var(X) + Var(Y) 


The following general results are useful: 


in general, for random variables X and Y and constants a and b 


E(aX + bY) = aE(X) + bE(Y) 
ElaX — bY) = aE(X) — bE(Y) 
lf X and Y are independent, then 
VarlaX + bY} 
Var(aX — bY) = 


Notice the + sign here. 


Example 4.18 
X and Y are independent random variables such that 
E(X)=10, Var(X)=2, E(Y)=8, Var(Y)=3. 
Find (a) E(SX+4Y), (b) Var(SX+4Y), (c) VarX-Y), (d) Var(AX + ¥). 


Solution 4.18 


(a) E(SX + 4Y) = SE(X)+4E(Y) =5 x 10+4x8=82 
(b) Var(SX + 4Y) = 5?Var(X) + 4?Var(Y) = 25x 2+16x3=98 
(c) Var(4X — Y) = (4)? Var(X) + Var(Y) = 4x 24+3=3.5 

(d) Var(hX + Y) = (4)? Var(X) + Var(Y) =4x24+3=3.5 


Prem 


Y, you will find 
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DISTRIBUTION OF X, + Xgteot X, 
Consider the random variable X where E(X) =“ and Var(X) = o. 
Take two observations Xy and X, from X. 

E(X)=4, E(X,)=% Var(Xy) =o", Var(X2) 
E(X1 + X2) = E(X,) + (Xp) =e rH 2p = 2E(X) 

Tf the observations are independent 
Var(X, +X) = Var(X 1) + Var(Xy 
This result can be extended to observations. 
BX, +X) 
B(X, + Xyte +X, 


If the observations axe independent 


=o. 


) = 92 +02 = 207 =2 Var(X) 


Example 4.19 


Find the expectation and varia 


nee of the number of heads obtained when six coins are tossed. 


Solution 4.19 7 
in i d, where X can take the values 0, 1. 

the number of heads when a coin Is tossec, re X can take the 

are Gator and variance of X. The probability distribution is 


E(X) = 0.5 (by symmetry) 

E(X?) =1x0.5= 0.5 
so Var(X) = E(X?)- E*(X) = 0.5 - 0.57 = 0.25 F 
.. +X, where Y is the number of heads when six heads are 


Now consider Y= X,+X2+ 


tossed. 
E(Y) = 6E(X) Var(Y) = 6 Var(X) 
= 6(0.5) = 6(0.25) 
=3 =1.5 
The expected number of heads is 3 and the variance is 1.5. pee 2 


cnt SEN MS SE 
all 


COMPARING THE DISTRIBUTIONS OF X, + X, and 2X 


Confusion sometimes arises between the random variable X, + Xj, where X,, X, are two 
independent observations of X, and the random variable 2X. 


You will see from the following example that the distributions of the two random variables, 
X, +X, and 2X are very different. 
Example 4.20 


When a tetrahedral die is thrown, the number on the face on which it lands, X, has 
probability distribution as shown, with E(X) = 2.5 and Var(X) = 1.25. 


x 1 2 3 4 


P(X =x) 0.25 0:25 0.25: 0.25 


(a) Find the probability distribution of S, the sum of the two numbers obtained when the die 
is thrown twice, where $ = X, + X, and illustrate it by drawing a vertical line graph. 
Find E(S) and Var(S). 

(b) Find the probability distribution of D, where D is double the number on which the die 
lands when it is thrown once. Illustrate by drawing a vertical line graph. 

Find E(D) and Var(D). 


Solution 4.20 


(a) Consider the sum when the die is thrown twice and illustrate the outcomes on a possibility 
space diagram. 
Scan take the values 2, 3, 4, 5, 6, 7, 8 and the outcomes (all equally likely) are shown in 
the diagram: 


S 


S=X +X, 


nh 


Second throw, X2 
w 


1 2 3 4 
First throw, X7_ 


The probability distribution of S is: 


Ss 2 3 4 5 oy 
a 4 
P(S=s) 3 = 


aD 
M 
oO 


Ie 


ar 
= 

ae 
ae 
ale 
al 
ale 


E{S)=5. (by symmetry) 

E(S’) =Zs?P(S=s) 

i(4+ 18448 +100 + 108 + 98 + 64) 
=27.5 


2345678 § 


It 


Var(S) = E(S*) ~ ES) 
=27.5-25 
=2.5 


As expected, 
E(S) = E(X, + Xp) = 2E(X) = § 
Var($) = Var(X , + Xp) =2 Var(X) = 2.5 
(b) D is double the number on which the die lands, 
The probability distribution for D is 


so D =2X. 


7 4 4 6 8 it 


0:25 0.25 0.25 


PD=4) (0.25 


d 
E(D)=5 (by symmetry) 
E(D’) == d?P(D=d) 
= 0,25(44 16+ 36+ 64) 
230 
Var(D) = £(D?) - ED) 
= 30-25 
aS. 
As expected, 
E(D) = E(2X) = 2E(X) = 5 
Var(D) = Var(2X) =4 Var(X) = 5 
Although the means of the two distributions are the same, 
«double the number’ has the greater variance. 
a . seneancennsesensertonata 
Summarising these results: 
i Multiples oj 


the variances are not. The variable 


For two observations | 


For 2 observations i 


Var(X, 


ort 0 derst: id whether multi les or sums are beim: considered. im 
important that you unders an ‘Ss beii ig dered. Think 
It is ip! 


carefully about this point. 
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Exercise 4e Combinations of random variables 


1. Independent random variables X and Y are such 
that E(X) = 4, E(Y) = S, Var(X) = 1, Var(Y) =2. 
Find 2 
(a) E(4X+2Y), 

(b) E(SX—- Y), 

(c) Var(3X+2Y), 
{d) Var{SY - 3X), 
{e) Var{3X - SY). 


2. Independent random variables X and Y are such 
that E(X2) = 14, E(Y2) = 20, Var(X) = 10, 
Var(¥) = 11, Find 
(a) E(3X —2Y), (b) Var(SX - 2Y). 


3. Independent random variables X and Y are such 
that E(X) = 3, E(X2) = 12, E(Y) = 4, E(¥?) = 18. 
Find the value of 
{a) E(3X-2Y), 

(b) E(2Y~3X), 
{c) E(6X+4Y), 
(d) Var(2X - Y), 
(e) Var(2X + Y), 
(f) Var(3¥+2xX). 


4. Independent random variables X and Y have 
probability distributions as shown in the tables: 


x 0 1 2 3 
P(X =x) 0.3 0.2 0.4 0.1 


y 0 1 2 
P(Y=4) 0.4 0.2 0.4 


(a) Find E(X), E(Y), Var(X), Var(Y). 

(b) Construct the probability distribution for 
the random variable X + Y. 

{c) Verify that E(X + Y) = E(X) + E(Y). 

(d) Verify that Var(X + Y) = Var(X) + Var(Y). 


Construct the probability distribution for 
the random variable X — Y. 

Verify that E(X - Y) = E(X) - E(Y). 
Verify that Var(X — Y) = Var(X) + Var(Y). 


5. Rods of length 2 m or 3 m are selected at 
random with probabilities 0.4 and 0.6 
respectively, 


(a) 
(b) 


{c) 


Find the expectation and variance of the 
length of a rod. 

Two lengths are now selected at random. 
Find the expectation and variance of the 
sum of the two lengths. 

Three lengths are now selected at random. 
Show that the probability distribution of Y, 
the sum of the three lengths, is. 


¥ 


6 7 8 9 


P(Y=y) 0.064 0.288 0.432. 0.216 


and find E(Y) and Var(Y). Comment on 
your results 


6. Find the variance of the sum of the scores when 
an ordinary die is thrown ten times. 


7. Xhasa p.d.f. given by P(X =x) = kx, 
x= 1,2, 3,4. Find 


(a) 
(b) 
(c) 
(d) 
(e) 
(f) 


ky 
E(X), 

Var(X), 

P(X, + X,=5), 
E(4X), 

Var(X, +X, +X). 


ra 


peas 


& 


& 


2 


ummary 


For the discrete random variable X with probability density function P(X =x) 
EP(X=x)=1 
Cumulative distribution function F(x) = P(X <x) 
w= E(X) = ¥xP(X =x): where # is the expectation of X 
E(g(X)) = 2 g@)P =) 
E(g(x)+ b(X)) = Ele) + E(b{X)) 
E(X2) =5 x? P(X =x) 
o? = Var(X) = EX uy? where Var(X) is the variance of X. 
= E(X2) =u? (or Var(X) = E(X”) — E?(X)) 
esx P(Xex— 
o = standard deviation of X= Varo 


For the random variable X and constants a and b, 
E(a) =4 Var(a) = 0 

E(aX) = aE(X) Var(aX) = a? Var(X) 
E(aX +b) = aE(X) + b Vat(aX + b) =a? Var(X) 


For any two random variables X and Y and constants 4 and b, 
E(X + Y) = E(X) + E(Y) 

E(X — Y) = E(X)- E(Y) 

E(aX + bY) = aE(X) + bE(Y) 

E(aX = bY) = aE(X) — bE(Y) 


For independent random variables X and Y and constants @ and b, 
Var(X +-Y) = Var(X) + Var(Y) 

Var(X — Y) = Var( X) + Var(Y) 

Var(aX + bY) =a? Var(X) + b* Var(Y) 

Var(aX = bY) = a? Var(X) +67 Var(Y) 


For independent observations of X; 
E(X, +X) #504 X,) = nE{X) 
Var(X, + Xyte + X,) =n Var(X) 


For multiples of X, 
E(nX) = nE(X) 
Var(nX) = 0’Var(X) 


Miscellaneous worked examples 


Example 4.21 


The discrete random variable X has probability function 


kx 
Gy 73 

2k. 
P(X =x)= aa 454,5 
0, otherwise 


Show that the value of k is 38. 


(a) 

b) Find th 

i a i : eer, that X is less than 3 or greater than 4. 
( 


d) Find (i) E(X), (ii) Var(X). 
Solution 4.21 
2k . 2k 
+1 5 
When x = 3, pitas ao 
10 


(a) Whenx=2, P(X=2)= 


When x = 4, P(X 4) =e 


When x = 5, P(X =5) = 10k 
24 


Now 2 P(X =x) =1 
2k 3k 8k 10k 


eee eee 
3 10°15 ° 24 
33k 


ans 
2 
poet 
33 
Substituting this value for &, the probability distribution for X is 
x 2 3 4 5 
P(X =x) 3 ft 3 55 
(b) P(X <3 or X>4) = P(X =2) + P(X=5) 
ae: 
33 99 
cs 49 
99 
(c) (3.2) = P(X <3.2) = P(X =2} + P(X =3) 
as 8 2 
33. AL 
14 


33 


(L) 


(d) (i) E(X) = LZ xP(X =~) in 
Dx 43x Grd x Bt 5 X55 
=3% 

(ii) E(X?) =Lx*P(X =x) as 
a4x B+ 9x He 16 x 35 425 x55 
= 144 
Var(X) = E(X?) — E2(X) 
= 1445 - (353) 
= 1.23 (3 sf.) 


Example 4.22 


Anne plays a game in which a fair six-sic 
loses fo. If the score is 4 or 5, Anne wins £x. If the score 


(a) Show that the expectation of Anne’s profi 
(b) Calculate the value of x for which, on ave 
(c) Given that x = 12, calculate the variance ©: 


Solution 4.22 
Let £X be Anne’s profit. 
P(score 1, 2 or 3) = 3=4, therefore P(X = -10)=4 
P(score 4 or 5) =3=4, therefore PX = x)=4 


P(score 6) = }, therefore P(X = 2x)=4 


The probability distribution for X is 


sided die is thrown once. If the score is 1, 2 or 3, Anne 


is 6, Anne wins £2x. 


tis £(3x — 5) in a single game. 
rage, Anne’s profit is zero. 
f£ Anne’s profit in a single game. 


x -10 x 2x 
P(X =x) 2 5 é 
(a) E(X) =ZxP(X =x) 
= 10 x () +x x (4) + 2x x @) 
1 2x 
=-S+ 3° + é 
= a =5 
3 
“ te £2 
So the expectation of Anne’s profit in a single game is £(9x — 5). 


2 
(b) If E(X) =0 then 5x-5=0 
x=7.5 


{c) When x = 12, the probability distribution becomes 


EA =10 42 24 


P(X =x) 4 + + 


E(X) =3x-5=2x12-5=3 from (a) 
E(X?) = Ex? P(X =x) 
= 100 x $+ 144x5+576x} 
= 194 
Var(X) = E(X?) ~ E?(X) 
=194-9 
= 185 


The variance of Anne’s profit in a single game is 185(£7). 
Example 4.23 
Any integer may be reduced to a single digit by the method illustrated below. 


5135416 
58> 54+8- 13-1434 


The random variable D denotes the digit that results from the reduction of an integer, selected 
at random, from the twenty integers 50, 51, 52, ..., 69. 

(a) Show that P(D = 5) =0.15 

(b) Determine the probability for each of the other possible values of D. 

(c) Calculate the expected value of D. 


(d) Calculate, to two decimal places, the variance of D. (NEAB) 
Solution 4.23 

To calculate D, consider the following 

50> 54+0-@ 60>6+0-6 

S1-5+1-6 617 6+1—>7 

52-5427 6226+2-—-8 

537 5+3-8 637 6+3-99 

54> 5449 64-10>14+0->1 


ay ee 65311314132 
56> 11-714+1—-2 66-12 1423 
57312-71423 67313-1434 
5813-71434 681471446) 
59314-31446 So Hs sh ee 


Three integers out of the twenty reduce to 5. These have been ringed in the list above. 


(a) P(D = 5) = 35=0.15 


(c) E(D) =24P(D = 4) 2 
m1 xd 42x R43 x td % wt 5% 


a d2+446+84 15 +184 14 + 16+ 18) 
5.05 

(d) E(D?) = 1x Ftd x At 9x Bet 16% st 25x 

= (2 + 8418 +32 +75 + 108 +98 + 128 + 162) 


= 20 

= 31.55 5 
Var(D) = E(D*) ~ ED) = 31.55 ~ 5.05 

= 6.05(2 d.p.J 


Miscellaneous exercise Af 


2 
2: Z+Oxe 
B+ 6x 47 X 39+ 8X 29 F 7 * 20 


2 
Jy 36x H+ 49 fy + 4 x tot 81% 30 


el 


iliser i i he probability 
é tiliser is sold in 25 kg sacks. T 

: tarencl is underweight by Y kg, mmarated to 
the nearest 0.1 kg, is given in the table below. 


“od 0.20.3 04 0.5 
Probability 0.5. 0.3 OL 0.075 0.025... 9 


Find the expected loss in weight per sack. 


ice quoted by the manufacturers to a 
pees for 1000 kool fertiliser, packed is 25 ke 
sacks, is £240. Estimate by how much t is pee 
exceeds the value of the fertiliser that wou! 
actually be supplied to the farmer. 


‘A discrete random variable X can take only the 
* Values 0, 1, 2 or 3, and ee 
distribution is given by P(X=0) = Rs 
PX = 1) = 3k, P(X = 2) = 4k, P(X = 3) = Sk, 
where k is a comes Find 
the value of k, 
ie the mean and variances of X. (NEAB) 
3. A random variable R takes the integer value r 
with probability P(r) where 
Pij=ke, 72 1,2,3,4, 
P(r) = 0, otherwise 
ie the value of k, and display the distribution 
raph paper, - 
(b) eer and the variance of the 
distribution, 
(c} ae mean and the variance of SR-3. {L) 


4. Acuriously shaped six-faced die produces a 


score, X, for which the probability distribution is 
given in the following table. 


Show that the constant k is 29. Find the mean 
and variance of X. 

The die is thrown twice. Show that the 
probability of obtaining equal scores is 
approximately }. 


(MEI) 


A random variable R takes the integer value r 
with probability P(r) defined by 


=1,2,3 
Pi) =k, r=1,2,3, 
Py=k7-1%, = 4,526 
P(r) =0, otherwise. 


Find the value of k and the mean and variance of 
the probability distribution. Exhibit this 
distribution by a suitable diagram. of 
Determine the mean and the variance 0} the 


variable Y where Y= 4R ~ 2. {L) 
6. A discrete random variable X has the 
distribution function 
x 1 2 4 5 
F(x) a t G i 
(a) Write down the probability Seas - - 


i ility distribution 
'b) Find the probability dis! t 
. of two independent observations from 
and find the mean and variance ©! 
distribution of this sum. 


7. The probability of there being X unusable 
matches in a full box of Surelite matches is given 
by P(X = 0) = 88, P(X =1)= 5k, 

P(X =2) = P(X =3) =k, P(X > 4) =0. 

Determine the constant & and the expectation 
and variance of X. 

Two full boxes of Surelite matches are chosen at 
random and the total number Y of unusable 
matches is determined. Calculate P(Y > 4), and 
state the values of the expectation and variance 
of Y. (C) 


8. Two unbiased four-sided dice, having the 


numbers 1, 2, 3 and 4 on their faces, are thrown 
together. The random variable D represents the 
modulus of the difference between the numbers 
on the two hidden faces. 


(a) Show that P(D =1)=} 

(b) Calculate the probability for each of the 
other possible values of D. 

(c) Calculate the expected value of D. (NEAB) 


9. A and B play a series of tennis matches. The 


probability that A wins any single match in the 
series is 0.6. The winner of the series is the first 
player to win either two matches in succession or 
a total of three matches. Show that the 
probability 


(a) that the series lasts exactly two matches is 
0.52, 

(b) that the series lasts exactly three matches is 
0.24. 

Calculate the probability that the series lasts 

exactly four matches. 

Hence, or otherwise,show that the probability 

the series last five matches is 0.1152. 

Calculate the expectation of n, the number of 

matches in the series. 


The prize-money involved depends on # and is 
shown in the table below. 


n 2 3 4or5. 


Prize-money £1000 £1240. £1510 


10. 


Tickets are sold, each of which entitles the 
purchaser to see the whole series of matches. 
Given that each ticket costs £5, calculate the 
number of tickets which must be sold to cover 
the expected value of the prize-money. (C) 


A fair cubical die has two yellow faces and four 
blue faces. The die is rolled repeatedly until a 
yellow face appears uppermost or the die has 
been rolled four times. The random variable B 
represents the number of times a blue face 
appears uppermost and the random variable R 
represents the number of times the die is rolled. 


11. 
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(a) Show that P(B=3)=§. 
{b) Find the probability distribution of B. 

(c) Find E(B). 

(d) Show that P(R=4)= 4. 

(c) Find P(R = B). (L) 


The discrete random variable X has the 
probability distribution given in the following 
table. 


x. 


T z 3 4 


P(X =x) 0:4 0.3 OE 0:2. 


12. 


13. 


Two independent observations of X are made. 
The value of the random variable Y is found by 
subtracting the smaller of the two values of X 
from the larger. If the two values of X are equal, 
Y is zero. Show that P(Y = 1} = 0.34 and tabulate 
the complete probability distribution of Y. 

Find 


(a) E(Y), 
(b) Var(Y), 
(c) P(Y> E(Y)). (C) 


A box contains five discs, labelled 1, 2, 4, 5 and 
6. Ina game a player draws a disc at random, 
replaces it and then draws again. The player’s 
score is the sum of the numbers on the two discs 
drawn. 

Construct a table showing the 11 possible scores 
and their probabilities. Find the expected score, 
In a social club this game is played and the prize 
is £1 for each point scored. The players pay 
£7.50 each time they play. Find the expected 
profit to the club after 250 games have been 
played. (C) 


On a long train journey, a statistician is invited 
by a gambler to play a dice game. The game uses 
two ordinary dice which the statistician is to 
throw. If the total score is 12, the statistician is 
paid £6 by the gambler. If the total score is 8, the 
statistician is paid £3 by the gambler. However if 
both or either dice show a 1, the statistician pays 
the gambler £2, Let £X be the amount paid to 
the statistician by the gambler after the dice are 
thrown once. 

Determine, the probability that (a) X = 6, 

(b) X = 3, (c) X=-2. 

Find the expected value of X and show that, if 
the statistician played the game 100 times, his 
expected loss would be £2.78, to the nearest 
penny. 

Find the amount, £a, that the £6 would have to be 
changed to in order to make the game unbiased. 


14. 


15. 


"The discrete random variable X can take only the 
values 0, 1,2, 3,4, 5. The probability 
distribution of X is given by the following, where 
a and 6 are constants. 

P(X = 0) = P(X = 1) = P(X=2) =4 

P(X =3)=P(X=4)=P(X=5)=b 

P(X > 2) = 3P(X <2) 

(a) Determine the values of a and b. 

(b) Show that the expectation of X is 4 and 
determine the variance of X. 

(c) Determine the probability that the sum of 


two independent observations from this 
distribution excceds 7. (C) 


A gambling machine works in the following way. 
The player inserts a penny into one of five slots, 
which are coloured Blue, Red, Orange, Yellow 
and Green corresponding to five coloured light 
bulbs. The player can choose whichever coloured 
slot he likes. After the penny has been inserted 
one of the five bulbs lights up. If the bulb lit up is 
the same colour as the slot selected by the player, 
then the player wins and receives from the 
machine R pennies, where 
PR=2)=4  P(R=4)=4 
P(R=6)=35, and 
P(R=8)=P(R=10) = 2 
If the colour of the bulb lit up and the slot 
selected are not the same, the player receives 
nothing from the machine, In either case the 
player does not get back the penny that he 
inserted. Assuming that each of the colours is 
equally likely to light up, and that the machine 
selects the bulbs at random, determine 
(a) the probability that the player receives 
nothing from the machine, 
(b) the expected value of the amount gained by 
the player from a single try, 
(c) the variance of the amount gained by the 


player from a single try. (C) 


6. (a) A regular customer at a small clothes shop 


observes that the number of customers, X, 
in the shop when she enters has the 
following probability distribution. 


(b) She also observes that the average waiting 


time, Y, before being served, is as follows. 


Number of 


customers, * 0 ft 2 3. 


4 


Average 
‘waiting time, 


y minutes 0 


2 6 9 12 


Number of 
customers; %. 0 1 2 3 4 
Probability 
plx) 0:15 0:34. 0.27 «0.14. 0.10 


Find the mean and standard deviation of X. 


17. 


18. 


Find her mean waiting time. (AEB) 


During winter a family requests four bottles of 
milk every day, and these are left on the door- 
step. Three of the bottles have silver tops and the 
fourth has a gold top. A thirsty blue-tit attempts 
to remove the tops from these bottles. The 
probability distribution of X, the number of 
silver tops removed by the blue-tit, is the same 
each day and is given by 

WX=0)=% P(X=1=%- 

P(X =2)=%5 P(X =3)= 45 

The blue-tit finds the gold top particularly 
attractive, and the probability that this top is 
removed is 3, independent of the number of 
silver tops removed. Determine the expectation 
and variance of 

(a) the number of silver tops removed in a day, 
(b) the number of gold tops removed in a day, 
(c) the total number of tops (silver and gold) 

removed in seven days. 

Find also the probability distribution of the total 
number of tops (silver and gold) removed in a 
day. (C} 


A player throws a die whose faces are numbered 
1 to 6 inclusive. If the player obtains a six he 
throws the die a second time, and in this case his 
score is the sum of 6 and the second number; 
otherwise his score is the number obtained. The 
player has no more than two throws. 

Let X be the random variable denoting the 
player’s score. Write down the probability 
distribution of X, and determine the mean of X. 
Show that the probability that the sum of two 
successive scores is 8 or more is ¥. 

Determine the probability that the first of two 
successive scores is 7 or more, given that their 
sum is 8 or more, 


Mixed test 4A 
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1. A discrete random variable X has the following 
probability distribution and y 
bab di and can only take the 


x 1 3 6 n 12 


Probability. - 0.14. 0.3 k 0.25. 0.15 


(a) Find the value of k. 
Given that E(X) = 6.0, find 
(b) the value of 2, 


(c) the variance of X, (C) 


2. When a certain type of cell is subjected to 


gray the cell may die, survive as a single cell 
or ivide into two cells with probabilities 4, 4, 4 
respectively. rea 
= cells are independently subjected to 
radiation. The random variable X represents the 
total number of cells in existence after this 
experiment. 


Mixed test 4B 


a Sh that P(X =2)=%. 
ind the probability distributi 
i Peery: y distribution for X. 
(d) Show that Var(X) = #2, 
pene two cells are submitted to radiation in a 
similar experiment and the random variable Y 
ee the total number of cells in existence 
er this experi it. i i 
aes aoe nen, The random variable Z is 
(e) Find E(Z) and Var(Z). (L) 


In a game two fair, cubi i i 
, cubical dice with f. 
numbered 1 to 6 are thrown. The ee the 
game is the positive difference between the 
a sone uppermost on the two dice. 
a ioe abs cee 
deals the probability distribution for the 
{b) Calculate the expected value of the score. 
{c) State the probability that the score is less 
than the expected value. (NEAB) 


1. The discrete random variable X has the 
probability function shown in the table below. 
x 1 2 3 4 5 
P(X =x) 0.2 °° 0.3 0.3 0.1 04 
Find 
{a) P(2<X <4), 
{b) F(3.7), 
{c) E(x), 
{d) Var(X), 
{e) E(X2+4X-3), (L) 
2. 


A box contains six discs, of which t 

labelled 2, three are labelled 3 and ae Pe 
labelled 6. A game consists of a player drawing 
two discs simultaneously from the box. The sum 
of the numbers on the two discs is denoted by X. 
{a) Find the probability distribution of X. ‘ 
{b) a E(X), E(X*) and the variance 


A player pays £20 for 30 is pai 
c po value of X he Shain ere 
c) Calculate the expected i 
: rane en fet ed profit or loss for 30 
alculate the value of k i 
would be fair. siileailas rc) 


An unbiased four-sided die has fa 

d ces numbered 
1, 2, 3 and 6. The die and a fair coin are eed 
together. The random variable R denotes the 
ee ap ees face of the die. If the coin 

ows heads, the score ri de i 
einen ecorded, S, is equal to 
(a) Tabulate the probability distributi 
" 

(b) Calculate the expected ValueotS.” ahs 
(c} Calculate the variance of S. 


(NEAB) 


Special discrete probability distributions 


In this chapter you will learn 


@ about the conditions needed to model a situation for a discrete variable using 
— auniform distribution 
— a geometric distribution, X ~ Geo (p) 
— a binomial distribution, X ~ B(n, p) 
— a Poisson distribution, X ~ Po (A) 
» how to calculate probabilities for these distributions and also the mean and variance 
® about the use of the Poisson distribution as an approximation to the binomial distribution 


about the distribution of the sum of two or more independent Poisson variables 


THE UNIFORM DISTRIBUTION 


Throw an ordinary die. The probability distribution of X, the number on the die, is shown in 
the table and illustrated by the vertical line graph. 


Ls 
= 
x 1 2 3 4 5 6 
P(X =x) é é é c é Vos ee ee 


P(X=x)=} for x=1,2,3,4,5,6 


This is an example of a discrete uniform distribution. 


Conditions for a uniform model 


For a situation to be described using a discrete uniform model, 


the discrete random variable X is defined over the set of 7 distinct values X 45% 25-19 Xn 
e each value is equally likely to occur and 


4 
P(X=x,=— for r= 1; 25.350 
n 


Example 5.1 


The discrete variable X is such that P(X = x) = for x = 20, 30, 45, 50. Find 
? 5 bf * im 


(a) the probability distribution of X. 
(b) w, the expectation of X, , 
{c) P(X <y), 

(d) o, the standard deviation of X. 


Solution 5.1 


(a) 
x: 20 30 4S 50 
Rx +) e € ¢ ¢ 
XP(X=x)=1 
4c=1 
c=0.25 


P(X=x)=0.25 for x=20, 30, 45,50 


NOTE: There are four values, each of which is equa ly ke y to occur and P(X =x,)=4=0,25 
> h 

( ) 4 
(b) = E(X) =X xi (x =x) 


= 20 x 0.25 + 30 x 0.25 + 45 
= x 0.25 + 50 x 0.25 


(c) P(X <p) = P(X < 36.25) 
= P(X = 20) + P(X = 30) 
= 0.25 + 0.25 
=0.5 


(d)  E(X*)=S.x2P(X =x) 
= 0.25(20? + 302 + 452 + 50?) 
= 1456.25 
Var(X) = E(X2) - 2 
= 1456.25 — 36.252 
= 142.1875 
o = ¥142.1875 = 11.9(3 s.f.). 


THE GEOMETRIC DISTRIBUTION 


Plasti i i i 
ee aac ie ee: ead a pein of breakfast cereal. The probability that a 
rabbit is 0.1. i ility distributi 
number of packets you open until you get a a ie ra a 
pes 1)= P(first packet contains a rabbit) =0.1 
Sa 2 = P(first doesn’t, second packet does) = 0.9 x 0.1 = 0.09 
= 3) = P(first doesn’t, second doesn’t, third packet does) = 0.9 x 0.9 x 0.1 = 0.081 


Similarly 


P(X =4) = 0.9 x 0.9 x 0.9 x 0.1 = (0.9) x OL 
P(X = 5) =(0.9)4 x 0.1 
P(X = 6) = (0.9) x 0.1 


and so on. A geometric model is being used in this example. 


Conditions for a geometric model 


For a situation to be described using a geometric model, 


® independent trials are carried out, 

@ the outcome of each trial is deemed either a success or a failure, 

e@ the probability, p, of a successful outcome is the same for each trial. 

The discrete random variable, X, is the number of trials needed to obtain the first successful 


outcome. 
If the above conditions are satisfie 
written 

X ~ Geo(p) 
The probability of success, p; is all that is needed to describe the distribution completely. It is 
known as the parameter of the distribution. 
Writing P(failure) as 4, where g=1-p: 


if X ~ Geo(p), the probability that the first success is obtained at the rth attempt is P(X=7) 


d, X is said to follow a geometric distribution. This is 


where 


PXer=q' Ly pforre 1,2, 3,4, vee 
so that I p= P 

I -g'p=4p 

P[X=3)=q"p and so on. 


NOTE: 


e Xcannot take the value 0, 
e the number of trials could be infinite, although this is unlikely in practice! 


Here are some diagrammatic illustrations of geometric distributions: 


X ~ Geo{0.3) X ~ Geo(0.5) X ~ Geo(0.8) 
g 084 @ 084 = 084 
il i tl 
3S cad 3S 
Qa a a 
0.6 4 0.64 0.6 4 
044 0.44 | 0.44 
0.24 024 | 024 
o++4 | | | 4 o4 fy 
0.12345 67% 0123 4 5 6* 


ne 


The mode of the geometric distribution 


From the diagrams, you can see that the mode of any geometric distribution is 1. This means 


that for an 
y value of p, one attempt is the i 
y value « most likel i 
success. This is quite a surprising result. i i ca dca 


P(X=1)=p 

P(X=2)=4p 

Since 0<q <1, gp <p. 

Also P(X =3)=q°p<qp<p andsoon. 
For example, if X ~ Geo(0.3) 
P(X=1)=0.3 

P(X =2)=0.7x 0.3 =0.21 < 0.3 

P(X = 3) = 0.77 x 0.3 = 0.147 < 0.21 < 0.3 


Example 5.2 


Jack is playing a board i i 
game in which he need: ones : 
to start the game. Find the probability oie s to throws a six with an ordinary die in order 


(a) exactly four attempts are needed to obtain a six. 

(b) at least two attempts are needed, ; 

e ; A : oe 

a re is aii in throwing a six in three or fewer attempts. 
e needs more than three attempts to obtain a six. , 


Solution 5.2 


X is the number : : : 
hs rete pel a er We cota 
(a) P(X =4)=q3p ey eo(c). 
=x) 
= 0.096 (2 s.f.) 
(b) P(X >2)=1-P(X=1) 


iH 
Dery 


i 


1- 
1- 
3 
é 


(c) P(X <3) = P(X =1)+ P(X =2) + P(X =3) 
=ptaptap 
mat Exet(Q?xg 
= 0.42 (2 sf.) 


Alternatively, 


P(X < 3) = P(success at some trial in the first three trials) 
= 1 - P(no success in first three trials) 


= 0.42 (2 s.£) 
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al 


sensors 
cen TEE EC AS NA 


These two results, illustrated above, are very useful: 


{fX~ Geolp) and q=4-p: 


ee ‘s 
P(X> x)= 


Example 5.3 


On a particular production line the probability 
control test, items are selected at random from t 
of an item is independent of that of other items. 


that an item is faulty is 0.08. In a quality ; 
he production line. It is assumed that quality 


(a) Find the probability that the first faulty item 


(i) does not occur in the first six selected, 
(ii) occurs in fewer than five selections. 


icki i fore the nth attempt. 
(b} There is to be at least a 90% chance of picking a faulty item on or before 


What is the smallest number 7? 


Solution 5.3 


i i i ila faulty one is selected. 
X is the number of items picked until a 
Using a geometric model with p = 0.08, q = 0.92, X ~ Geo(0.08). 
i =qe= 6 0.61 (2 s.f.) 
i) P(X > 6) =q*=0.925=0 
. i P(X <5) = P(X <4) =1-q'=1-0.924=0.28 (2 sf) 
(b) You need to find # such that P(X <n) 20.9 
But P(X <n) =1-q" 
=1-0.92” 
So 1-0.92"20.9 
0.1 20.92” 
0.92" < 0.1 
By trial and improvement, 
0.9225 = 0,124 ...> 0.1 
0.926 = 0,114... > 0.1 
0,9227=0.105 ...>0.4 
0.9228 = 0.096 ...< 0.1 


‘The smallest value of 7 is 28. 


i ind 2: 
a have studied logarithms in Pure Mathematics, you could use them to fin 


If yo 


0.92" < 0.1 
n log 0.92 < log 0.1 
log 0.1 
n> 
log 0.92 + 
ie. n>27.6... 
The smallest value of 7 is 28, as before. 


Taking logs to base 10 of both sides 


log 0.92 is negativ 


egative quantity 


‘he inequality, 


EXPECTATION AND VARIANCE OF THE GEOMETRIC DISTRIBUTION 
lf X ~ Geo(p), 
eat xl 
E(X) =—, 3 
p p 
Example 5.4 


When I make a telephone call to an office, the probability of not getting through is 0.45. If I 
do not get through, then I try again later. Let X denote the number of attempts I have to make 
in order to get through. Stating any necessary assumptions, identify the probability 
distribution of X. Hence, calculate 

(a) P(X >4), 

(b) E(X) and Var(X). 


Solution 5.4 


X is the number of attempts I have to make in order to get through. Assuming that the 
attempts are independent and the probability of getting through is the same for each attempt, 
then X follows a geometric distribution with p = 0.55, q = 0.45. 


(a) P(X) > 4) = P(X >3)= qg - (0.45) = 0.091 (2 s.f.) 
(b) BOO == 55" 18 (2 s.f,) 


0.55 
q 0.45 
Var(X) PF = 0552> 1.5 (2 s.f.) 
Example 5.5 


Identical independent trials of an experiment are carried out. The probability of a successful 
outcome is p. On average, five trials are required until a successful outcome occurs. 

(a) Find the value of p. 

(b) Find the probability that the first successful outcome occurs on the fifth trial. 


Solution 5.5 


X is the number of trials up to and including the first success. 
X ~ Geo (p) and E(X)=5. 


(a) E(x) 
Pp 
eae 
Pp 
=4.02 
pase . 


(b) X ~ Geo(0.2), ie. p= 0.2, 4-08 


P X=5)=q'p 
= 0.84 0.2 


= 0.08192 


Example 5.6 
X ~ Geo(p) and it is known that P(X = 2) 


Solution 5.6 
P(X=2)=qp where q=1-p 
O.21=(1-p)*P 
0.21=p-p 
p?—p+0.21=0 
(p- 0.3)(p - 0.7) = 0 
p=03 or p=0.7 
Since p < 0.5, p = 9.3 


P(X=1)=p=0.3 


eto uentcns Sene EOE 


$0 


poaiveriean 


sence ment 


= 0.21 and p< 0.5. Find P(X = 1). 


aq geometric distributions 


Exercise 5a ~The uniform ane & 


4. The probability distribution for the random 
variable X is shown in the table. 


6 7 8 9 10 


x: 


P(X =x) a a a. a 


a 


Find 

(a) the value of a, 

(b) the mean of x, 

(c)_ the probability that 
mean. 


X is the smaller than the 


2. The random variable X is Geo(0.35). Calculate 


P(X =4), {b) P(X> 4), 
Pix<3), (d) ECX). 


3. A coin is biased so that the probability of 
* obtaining a head is 0.6. 
ae Gaon vatiable X is the number of tosses 
up to and including the first head. Find 


(a) P(X<4), 

(b) P(X> 5), 

(c)_ the most like 
is obtained, 


ely number of tosses until a head 


(d) the expected number of tosses until a head is 


obtained, 
(e) the expected number of to 


obtained. 


sses until a tail is 


iti im to 
4. Asixth former is waiting for a bus to a him 
town. He passes the time by counting ia " 
number of buses, up to and including the o 


that he wants, that come along his side of the 


130% of the buses travelling on that side of the 

road go to town, what is : 

(a) the most likely count he makes to the arri 
of one that will take him into town, 

(b) the probability thai 
four buses? 


val 


5. During January the probability that it will rain 


i i D3 
on any given day is 0.5 : : 
Sanne a necessary assumption, find the 


probability that 
(a) the first rainy day in January is on 


5 January, 
(b) ae not rain before 8 January. 


t he will count, at eS ) 


10. 


11, 


12. 


. A random number machine generates random 


digits between 0 and 9. Each of the ten digits is 
equally likely to be generated. 


(a) X is the value of the digit generated. 
Find 
(i) P(X <6), 
(i) P(X>7), 
(iti) E(X), 
{iv) the standard deviation of X. 


(b) X is the number of digits generated to the 

first occurrence of a 5. 

Find 

(i) the probability that the first occurrence 
of the digit 5 is at the seventh number 
generated, 

(ii) the most likely number of digits 
generated to obtain a 5, 

(iii) the mean number of digits generated to 
obtain a 5. 


. X ~ Geo(0.5). Find 


(a) the mode, 
(b) the mean of X, 
(c) the standard deviation of X. 


. A darts player practises throwing a dart at the 


bull’s eye on a dart board. Independently for 
each throw, her probability of hitting the bull’s 
eye is 0.2. Let X be the number of throws she 
makes, up to and including her first success. 


{a} Find the probability that she is successful for 
the first time on the third throw. 

(b) Write down the distribution of X and give 
the name of the distribution. 

(c} Write down the probability that she will 
have at least three failures before her first 
success. (L) 


The random variable X follows the geometric 
distribution with probability p = 0.3. 

(a) Write down the probability P(X = 4). 

(b) Carefully explain why P(X =) 


is 0.7%" ' 0.3. 
(c) Describe in words a situation that has 
probability 0.777. (O) 


X ~ Geo(p) and the probability that the first 
success is obtained on the second attempt is 
0.1275. If p > 0.5, find P(X > 2). 


The probability that a telephone box is occupied 
is 0.2. Find, to two significant figures, the 
probability that a person wishing to make a 
telephone call will find a telephone box which is 
not occupied only at the sixth box tried. (L) 


An unbiased coin is tossed repeatedly until a tail 
appears. Find the expected number of tosses. 
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13. In a computer game, the probability that the 
player hits the target is 0.4 for each attempt and 
the result of each attempt is independent of all 
others. Find 


(a) the probability that he hits the target for the 
first time on the fourth attempt, 

(b) the mean number of attempts needed to hit 
the target, 

{c) the standard deviation of the number of 
attempts, 

{d) the most likely number of attempts to hit the 
target, 

(e) the probability that he takes more than 
seven attempts to hit the target. 


14. Alice runs a stall at a fete in which each player is 
guaranteed to win £10, Players pay a certain 
amount each time they throw a die and must 
keep throwing the die until a four occurs. When 
a four is obtained, Alice gives the player £10. 


On average Alice expects to make a profit of SOp 
per game. How much does she charge per throw? 


15, During the winter in Glen Shee, the probability 
that snow will fall on any given day is 0.1. 
Taking 1 November as the first day of winter 
and assuming independence from day to day, 
find to two significant figures, the probability 
that the first snow of winter will fall in Glen Shee 
on the last day of November (30th). 


Given that no snow has fallen at Glen Shee 
during the whole of November, a teacher decides 
not to wait any longer to book a skiing holiday. 
The teacher decides to book for the earliest date 
for which the probability that snow will have 
fallen on or before that date is at least 0.9. Find 
the date of the booking. {L) 


16. In many board games it is necessary to ‘throw a 
six with an ordinary die’ before a player can start 
the game. Write down, as a fraction, the 
probability of a player 


(a) starting on his first attempt, 

(b) not starting until his third attempt, 

{c) requiring more than three attempts before 
starting. 

What is 

(d) the most common number of throws 
required to obtain a six, 

{e) the mean number of throws required to 
obtain a six? 

Prove that the probability of a player requiring 

more than attempts before starting is (2}”. 

(f) What is the smallest value of x if there is to 
be at least a 95% chance of starting on or 
before the nth attempt? (O) 


THE BINOMIAL DISTRIBUTION 


i i le have blood type B. 
ticular population, 10% of peop’ to 
ee on de copalanon: what is the probability that exac 


Since the people are selected at random, 


B) = P(B’) 
independent of that of another so P(type B) = P(B) = 0.1, P(not type B) ( 


e diagram. 


P(B’, B, B) = 0.9 x 0.1 x 0.1 


i B 
i Second Third 
pone person person 
: " B) + P(B', B, B) « on diagra 
Ps type B) = P(B, B, B') + P(B, B’; 
P(exactly two typ = ae oe 
= 0.027 


Now consider the situation when eight people ar : 
exactly two of the eight people will have blood type B? sented 
You could extend your tree, but it would become very complicated. 
probability as follows since you want two wl ue 
‘ = P(B’, B’, B', B’, BY, BY, B, 
P(choose 6 B' then 2 B) : a ae 
But there are several arrangements of this shee i 
for example B', B’, B, B’, a B, B, B' ie Rie eae , B', B’, B', 
each with a probability of occurring of U7 SS" © tas 
The number of different arrangements is given by *C,, sometimes written g\2 
(see page 215). . 
This can be found directly on your calculator using the ,,C, key: 
(You may have to press SEUFT])[8] LC, 2) : 
ni 8 
otherwise use the formula "C,= Aont 


tl 


On calculator: 


BOAAMAaAYE 


dat 
If three people are selecte 
tly two of them have blood type B? 


assume that the blood type of one ea : 


P(B, B, B’)=0.1 x 0.1 x 0.9 = 0.009* 


P(B, B', B) =0.1x 0.9 x 0.1 = 0.009* 


= 0.009* 


e selected. What is the probability that 


t is possible to find the 
th type B and six who do not have blood type B. 


3 


You should find that *C, = 28. So there are 28 different arrangements of two who have blood 
type B and six who do not have blood type B. 


Therefore P(exactly 2 have type B) = 28 x 0.9% x 0.17 = 0.45 (2 sf.) 


Using a similar argument, you could find the probability that exactly two have blood type B 
in a randomly selected group of 12 people. In this case, ten will not have type B and 


P(exactly 2 have type B) = ”C, x 0.9" x 0,17 = 0.23 (2 s.f.) 


The above three situations have been described using a binomial model. 


Conditions for a binomial model 


For a situation to be described using a binomial model, 


a finite number, n, trials are carried out, 

the trials are independent, 

the outcome of each trial is deemed either a success or a failure, 

the probability, p, of a successful outcome is the same for each trial. 


@eee 


The discrete random variable, X, is the number of successful outcomes in 7 trials. 
If the above conditions are satisfied, X is said to follow a binomial distribution. This is written 
X ~ Bin,p) or X ~ Bin(n, p) 


NOTE: The number of trials, 2, and the probability of success, p, are both needed to describe 
the distribution completely. They are known as the parameters of the binomial distribution. 


Writing P(failure) as q where q = 1 - p: 


~ Bn, p), the probability of obtaining r successes in x trials is P(X = +) where 
n="C.q""p" for r=0,1,2,3,..,% 

For the three situations described above: 

When 3 people are selected, n = 3, p= 0.1, q=0.9. 

X is the number of successful outcomes in 3 trials, so X ~ B(3, 0.1). 


P(X =2)=*Cq'p? 
=3x0.9x 0.1? 
= 0.072 


When 8 people are selected, 7 = 8, p =0.1, q= 0.9. 
X is the number of successful outcomes in 8 trials, so X ~ B(8, 0.1). 
P(X =2)=°C,q°p? Note thar #¢. 

= 28 x 0.9% x 0.1? 

= 0.15 (2 sf.) 


ected, 2=12,p= 0.1, q= 0.9. 


When 12 people are sel 
essful outcomes in 12 trials, so X ~ B(12, 0.1). 


X is the number of suce 
P(X =2)= 2C,q" 2 Note thar ? C, = 66 
= 66 x 0.9! x 0.17 
= 0.23 (2 sf.) 


Example 5.7 
At Sellitall Supermarket, 60% of customers pay b 
randomly selected sample of ten customers, 


y by credit card, 
credit card. 


y credit card. Find the probability that ina i 


(a) exactly two pa 
(b) more than seven pay by 


Solution 5.7 
X is the number of customers inas 


it card’ as success, P = 0.6, 4 
a binomial model can be use: 


ample of ten who pay by credit card, 
Consider ‘paying by credi =1-p=04. 
Assuming independence, d, with n= 10, 


50 X ~ B(10, 0.6). 
(a) P(X =2)= °C,q°p? ve index numbers add up to 10 
= 45 x 0.48 x 0.67 


= 0.011 (2 s.f.) 
(b) P(X > 7) = P(X = 8) + P(X = 9) + P(X = 10) 
is °Cyq?p* + WC q'p? + WCq°P 
= AS x 0.42 x 0.68 + 10 x 0.4? 0.69 + 0.6" 
= 0.17 (2 sf.) / 


Tt is useful to note that, for any binomial distribution, 


P(X=0)="Coq"P® but p®=1 and "Co= oy yah since O!=1 
ini 


so P(X=0)=q" 


! 
1 BPs cle 


‘Also P(X =n) ="C,q°p" but gz 
nt O! 


so Pix = 7) 
There is a link between the probabil 


binomial expansion of (q + p)" whic 
illustrated in the following example. 


| distribution and the terms in the 


ities in the binomia 
Mathematics. This is 


h you may have studied in Pur 


e carried out. The probability of a successful 


Example 5.8 
Five independ 
outcome is p a 

probability distribution of X, where 


your answer. 


ent trials of an experiment ar 
nd the probability of failure is 1 -p = 4- 

Write out the X is the number of successful outcomes in 
five trials. Comment on 


Solution 5.8 
X ~ B(S, p) and X takes the values 0, 1, 2, 3, 4, 5 


P(X =0) = 5Cyq>p?= 4° 
P(X = 2)=5C,q3p? = 10q%p? 
P(X = 3) =*Cyq?p? = 10q"p? 
P(X =4) = SC,q'p*=5Sq'p* 
P(X = 5) =*Coq°p* =p? 


Notice 


hat the powers of p and 
t the powers of p and g add up to 3 cach time. 


The terms q°, Sq*p! 5 
s q°, 5q‘p', ..., p> are the terms in the binomial expansion of (q + p)* 


So a + Sap + 10q°p? + 104° &. Gk ae gee "Sige 
PR=0)  PX=1) P=} PK =3) fe 4) seas - 

But (q+ p)5=1, since q+p=1, 

2 P(X =0)+ P(X =1)+...4+P(X=5)=1. 


This confirms that the total sum of the probabilities is 1 


NOTE: Some vertical line gra phs illustrating the binomial distribution are given on pa; 289, 
a 8 g 8 
Be 


Example 5.9 
The random variable X is distri 
ibuted B(7, 0.2). Fi , 
la) P(X=3), ( ). Find, correct to three decimal places, 


(b) P(l<X<A4), 
(c) P(X > 1). 


Solution 5.9 
p=0.2,q=1 -p=0.8,2=7 


(a) P(X =3) =’C,q*p? 
= 35 x 0.84 x 0.2° 
= 0.115 (3 d.p.) 


(b) P(1<X<4)=P(X=2)+ P(X =3) + P(X=4) 
=7C,g>p24+7Caq*pi+7 
es potas. Cyq°p* 
= 21 x 0.85 x 0.27 +35 x 0.84 3 
TpMsieip) x 0.84 x 0.23 + 35 x 0.8 x 0.24 
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(c)  P(X> 1) = P(X = 2) + P(X = 3) ++ + P(X =7) 
Rather than calculate all these terms, it is much quicker to find 


P(X>1)=1-P(X<1) 
=1-(P(X=0)+P(X=1)) 
Sp) 
=1-(0.87 +7 x (0.8)$ x 0.7) 
= 0.423 (3 dp.) 


Example 5.10 


A box contains a large number of pens. The probability that a pen is faulty is 0.1. 
How many pens would you need to select to be more than 95% certain of picking at least one 


faulty one? 


Solution 5.10 


Let # be the number of pens you need to select. X is the number of faulty pens in ”. 
Assuming independence and using a binomial model, X ~ B(n, 0.1), with p = 0.1, q = 0.9. 


You want P(X 21)>0.95 


But P(X>1)=1- P(X =0) 
=1-0.9" 

So 1-0.9"% > 0.95 

1-0.95 > 0.9" 

0.05 > 0.9” 

ie. 0.9" < 0.05 
By trial and improvement 
0.9% = 0.071 ... (greater than 0.05) 
0.939=0.042... (less than 0.05) 
So the value of 7 lies between 25 and 30. 
0.976 = 0.0646... {greater than 0.05 
0.977 = 0.058 ... (greater than 0.05) 
NOTE: On the calculator {0.9 x| [26] [=] (0.0646 ...). 
To get 0.977 all you have to do is press [x {0.9} [=| (0.0581 ...) and so on. The answers 
are of course getting smaller because you are mu’ tiplying by a number between 0 and 1. 
0.92% =0.0523... (greater than 0.05) 
0.929= 0.0471... (less than 0.05) 


You need to select at least 29 pens. 
Alternatively, using logarithms: 


0.2? < 0.05 
Taking logs to base 10 of both sides, 


nlog 0.9 < log 0.05 


From the calculator, you find that lo ivi 
; 2 g 0.9 =-0.045 ..., so divide both sid 
reverse the inequality (as you are dividing by a negative quantity). a 


log 0.05 
log 0.9 
n> 28.4... 


The least value of # is 29, as before. 


Using cumulative binomial probability tables 


If you have access to these tables, you may wish to use them to calculate probabilities 
The tables are printed on page 6 i 

ge 645. They give P(X <r) for vari i 
an extract for B(7, 0.2), the distribution used in Example 9. Bea eS 


aC p=0.2 


A) 0.2097 
i 0.5767 
. 0.8520 
0.9667 
0.9953 
0.9996 
1.0000 
1,0000 


Wet ae ode ST 


Using the tables to work out the probabilities required in Example 5.9: 
(a) P(X =3) = P(X <3)- P(X <2) 

=0,9667-0.8520 

= 0.115 (3 d.p.) 


(b) P(L <X <4) = P(X =2) + P(X =3) + P(X =4) 
= P(X <4)-P(X<1) 
= 0.9953 - 0.5767 
= 0.419 (3 d.p.) 
(c) P(X > 1) =1-P(X<1) 
= 1-0.5767 
= 0.423 (3 d.p.) 


Example 5.11 


The random variable X is distributed B(5, 0. ivi i 

use the extract from the cumulative Final ieabebily GbE ae Nes 
(a) P(X <4) 
(b) P(X =2) 
(c) P(X <3) 
(d) P(X>1) 
(e) P(X > 3) 
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Solution 5.11 


2 3 dp. 
(a) P(X <4) = 0.9976 = 0.998 (3 &P ) 97 0.3087 = 0.309 (3 dp.) 


(b) P(X =2) = P(X <2)- PIX <1) = 0.8369 i 
= = 0.837 (3 d.p. 
P X <3) = P(X <2) = 0.8369 0 
¥ ae 1)=1-P(X< 1) =1-0.5282 = 0.4718 = 0.472 (3 dp.) 
(e) PIX > 3) = 1 P(X <2) 21 - 0.8369 = 0.1631 = 0.169 B jp) _ 


binomial tables 


mmetry properties to rea 
asses . of p are given only up to p= 0.5. 


e cumulative binomial tables, values 


ions of th or 
sate 5, you need to use the symmetry properties of the 
: 


To use the tables for values of p > 0. 
binomial distribution. 

This is illustrated in the sketches 0 
In both these distributions = San 


f the probability distributions of B(S, 0.3) and B(S, 0.7). 


dnote that 0.7 = 1- 0.3. 


xX ~ BG, 0.7) 

X ~ B(5,0.3) y 
y 0.4 

0.4 
03 

03 
0.2 

0.2 
0.1 

od | 


012 3 4 8% 
You can see that 
P(X = 0| p= 0.3) = 0.17 = P(X = 51 P= 0-7) 
PIX = 1| p= 0.3) = 0.36 = P(X =4|P = 0.7) 
P(X =2|p=0.3)=0.31=P(X=3 |p =0.7) 
and so on. 
Also P(X <2 |p = 0.3) = 0.84 = P(X? 3|P = 0.7) 
In general 
P(X =r|X~ Bor, py) = PiX= 4 
P(X <r|X~ Bla, p)) = Pixon 
P(X > r| X~ Bln, p)) = P(X Sr 


yiX~ Bur, 1~p)) 
-1|X~ Bin, 1 py} 
|X ~ Buz, 1- p)) 


Example 5.12 


i i 0.6). 
The random variable X is B(8, “fi 
Use the extract of the cumulative binomial tables for X ~ B(8, 0.4) to fin 


(a) P(X? 3) 
(b) PIX <2) 
(c) P(X =5) 


Solution 5.12 
Using n= 8, | 


wo valu 


(a) P(X >3|p=0.6)=P(X <5 |p =0.4)= 0.9502 


q T 
‘These two values add up to #7 
(b) P(X <2|p=0.6) =P(X>6|p=0.4) 
=1-P(X<5|p=0.4) 
=1-0.9502 
= 0.0498 


(c) P(X=5|p=0.6) =P(X =3|p=0.4) 


=P(X<3|p=0.4) - P(X<2|p=0.4) 


= 0.5941 - 0.3154 
= 0.2787 


NOTE: 


e@ It is sometimes quicker to use the cumulative tables, but you should make sure that you 


know how to calculate the probabilities directly. 


e The tables are not available for all possible values of p. 
e The values given in the tables should agree with the calculated values to three decimal places. 


Exercise 5b The binomial distribution 


Give answers to three significant figures. 


1. 30% of pupils in a school travel to school by 
bus. 
From a sample of ten pupils chosen at random, 
find the probability that 


{a) only three travel by bus, 
(b) less than half travel by bus. 


2. Ina survey on washing powder, it is found that 
the probability that a shopper chooses Soapysuds 
is 0.25. Find the probability that in a random 
sample of nine shoppers 
(a) exactly three choose Soapysuds, 

(b) more than seven choose Soapysuds. 


3. A bag contains counters of which 40% are red 
and the rest yellow. A counter is taken from the 
bag, its colour noted and then replaced. This is 
performed eight times in all. 

Calculate the probability that 


(a) exactly three will be red, 
{b) at least one will be red, 
(c) more than four will be yellow. 
4. The random variable X is B(6, 0.42). Find 
{a) P(X=6), (b) P(X=4), (c}) P(X <2). 


5. An unbiased die is thrown seven times. Find the 
probability of throwing at least 5 sixes. 


10, 


. The probability that it will rain on any given day 


in September is 0.3. Stating any assumption 
made, calculate the probability that in a given 
week in September, it will rain on 

{a) exactly two days, 

(b) at least two days, 

(c) at most two days, 

(d) exactly three days that are consecutive. 


. A fair coin is tossed six times. Find the 


probability of throwing at least four heads. 


. Assuming that a couple are equally likely to 


produce a boy or a girl, find the probability that 
in a family of five children there are more boys 
than girls. 


. X is B(4, p) and P(X = 4) = 0.0256. 


Find P(X = 2). 


Charlie finds that when she takes a cutting from 
a particular plant, the probability that it roots 
successfully is }. 


(a) She takes nine cuttings. Find the probability 

that 

(i) more than five cuttings root 

successfully, 

(ii) at least three cuttings root successfully, 
(b) Find the number of cuttings that she should 
take in order to be 99% certain that at least 
one cutting roots successfully. 


EXPECTATION Al 


11. An experiment consists of taking seven shots at a 
target and counting the number of hits. 
The probability of hitting the target with a single 
shot is 0.6. Using a binomial model, find the 
probability that in seven attempts the target is hit 
at most twice. 
Give a reason why the binomial model may not 
be a good one to use in this situation. 


12. In the mass production of bolts it is found that 
5% are defective. Bolts are selected at random 
and put into packets of ten. 

A packet is selected at random. Find the 
probability that it contains 


(a) three defective bolts, 
(b) less than three defective bolts. 


Two packets are selected at random. 


(c) Find the probability that there are no 
defective bolts in either packet. 


13. A coin is biased so that it is twice as likely to 
show heads as tails. The coin is tossed five times. 
Calculate the probability that 


{a) exactly three heads are obtained, 
{b) more than three are obtained. 


44. The random variable X can be modelled by a 
binomial distribution with 2 = 6 and p = 0.5. 
Construct the probability distribution and 
illustrate it graphically. Comment on the 
distribution. 


45. The probability that a target is hit is 0.3. Find 
the least number of shots which should be fired if 
the probability chat the target is hit at least once 
is greater than 0.95. 

State any assumptions that you have made. 


It can be shown that 


If X ~ Bln, p) 


16. 


a7: 


18. 


19. 


20. 


21. 


1% of light bulbs in a box are faulty. Using a 
binomial model, find the largest sample size 
which can be taken if it is required that the 
probability that there are no faulty bulbs in the 
sample is greater than 0.5. 

Comment on the use of the binomial model in 
this situation. 


Ina test there are ten multiple choice questions. 
For each question there is a choice of four 
answers, only one of which is correct. A student 
guesses each of the answers. 


{a) Find the probability that he gets more than 
seven correct. 


He needs to obtain over half marks to pass and 
each question carries equal weight. 


(b) Find the probability that he passes the test. 


X ~ B(n, 0.3). Find the least possible value ofn 
such that P(X 2 1) = 0.8. 


Given that X ~ B(7, 0.85) use the cumulative 
binomial probability tables on page 646 to write 
out the probability distribution of X. 


The random variable X is B(x, 0.6) and 
P(X <1) = 0.0256. Find the value of 7. 


For each of the experiments described below, 
state, giving a reason, whether a binomial 
distribution is appropriate. 

Experiment 1: A bag contains black, white and 
ted marbles that are selected one at a time, with 


replacement. The colour of each marble is noted. 


Experiment 2: This experiment is a repeat of 
experiment 1 except that the bag contains black 
and white marbles only. 

Experiment 3: ‘This experiment is a repeat of 
experiment 2 except that the marbles are not 
replaced after each selection. (L) 


ND VARIANCE OF THE BINOMIAL DISTRIBUTION 


E(X)=ap and Var(X) = pq where g=1-p 


These results can be quoted and should 
example. 


Example 5.13 


be learnt. They are illustrated in the following 


The random variable X is B(4, 0.8). Construct the probability distribution for X and find the 
expectation and variance. Verify that E(X) = mp and Var(X) = pq. 


Solution 5.13 
X is B(4, 0.8) som =4 and p =0.8. 


P(X =0)=0.24 = 0.0016 
P(X=1)=4x0.23x0.8  =0.0256 
P(X =2)=4C, x 0,2? x 0.87 = 0.1536 
P(X=3)=4C, x 0.2 x 0.83 =0.4096 
P(X =4)=0.84 = 0.4096 


The probability distribution for X is 


x 0 1 5 5 Fi 


P(X =x) 0.0016 0.0256 0.1536 0:4096 
E(X) = IxP(X =x) 


eas +1 0.0256 + 2 x 0.1536 +3 x 0.4096 + 4 x 0.4096 


E(X?) = Ix? P(X =x) 
= 1x 0.0256 +4 x 0,1536 + 9 x 0.4096 + 16 x 0.4096 


0.4096 


= 10.88 
Var(X) = E(X*) - E?(X) 
= 10.88 - 3.2? 
= 0.64 
Now p=8x0.4=3,2 E(X) = np 
npg = 8 x 0.4 x 0.6 = 0.64 


Var(X) = npq 


Example 5.14 


he probabili y hat it will be a fine day is 0.4. Find the expected number of fine days in a 
I ity that it Ib y ‘p 7 0. y 


Solution 5.14 


Let X be the number of fine days i i 
Let ys in a week, Assuming th i 
is independent of the weather on other days, Pte eae eee 


X ~ B(n, p) with n= 7 and p =0.4 

The expected number of fine days = E(X) 
=np 
=7x04 
=2.8 

Standard deviation of = VVar(X) 

= Vnpq 
=V7x0.4x 0.6 
= 1.3 days (2 s.f.) 


RSE IN &-LEVEL ST 


Example 5.15 
X is Bln, p) with 


and p. 


mean 5 and standard deviation 2, Find the values of 7 


Solution 5.15 


E(X) = np, therefore np = 5 . @ 


Var(X) = pq, therefore npqg=2?=4 ..@ 


Substituting for np in equation ® 5q=4 
q=0.8 
So p=1-4 
p=02 
Substituting for p in equation ® nx0.2=5 
n=25 


er 


HHT 


Fitting a theoretical distribution to practical data 


are experimental results with theoretical data as illust 


rated in 


It is sometimes useful to comp 
the following example. 


Example 5.16 
A biased coin is tossed four times and the number of heads noted. The experiment is 
performed 500 times in all and the results are summarised in the table: 


Numbet of heads 


Frequericy 


(a) From the experimental data, estimate the probability of obtaining a head when the coin is 
tossed. 
(b) Using a binomial distribution with the same mean, calcula 


of obtaining 0, 1, 2, 3 and 4 heads. 


te the theoretical probabilities 


Solution 5.16 
(a) For the frequency distribution, 


Efe 1300_ 


=Rewe= 2.6 
mean =x Ef 500 


Let X be the number of heads in four tosses. Then X ~ B(4, p). 
For a distribution with the same mean, Ap =2.6 
p=0.6S 


An estimate of the probability that the coin shows heads is 0.65. 
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(b) Using X ~ B(4, 0.65) calculate iliti 
, 0. the probabilities of 0. i 
these by 500 to obtain the theoretical eee aaa 


: re Frequency. 
{nearest integer) 

0 0.35+ = 0.015. 8 

1 4x 0.353 0.65 =O.111.., 56 

2. 6x 0.35? x 0.657 = 0.31044: 155 

3 4x 0.35 x 0.653 = 0.384 192 

4 0.654% =0:178 32. 89 
Total’ 500 


ee sa a reasonably well with the original distribution 
atistical test to compare the two sets of data, the x? test, is illustrated on page 571 


Diagrammatic representation of the binomial distribution 


R Pp 


3 0.3 


02 0.2 


0.1 01 


01234567 * 


od ps 


0.2 


0.1 


0123 45 6 7 8 9 1011 12¢+20)* 


0123456789 Probabilities too 
small to illustrate. 


The mode of the binomial distribution 
be mode is the value of X that is most likely to occur. 
rom the probability distribution sketches above, it can be seen that 


‘ : ; 
: when p= 0.5 and # is odd, there are two modes, 
otherwise the distribution has one mode. 


The m : 

het tabs Walesa en ak uy ae ed 
ee . owever very tedi iti é 

robs ities 6? os fiewat clos oaks i Ay it is usually only necessary to consider the 
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Example 5.17 


The probability that a student is awarded a distinction in the Mathematics examination is 0.05. 


Ina randomly selected group of 50 students, what is the most likely number of students | 


awarded a distinction? 


Solution 5.17 


X is the number of students who are awarded a distinction in 50, so X ~ B(50, 0.05). | 


E(X) = np = 50 x 0.05 = 2.5, so calculate the probabilities for values of X near 25. 


{ 
P(X = 1) = 50x 0.95% x 0.05 = 0.202 ... 
P(X =2) = °C, x 0.95" x 0.05? = [0.261]... 
P(X = 3) = °C, x 0.9547 x 0.053 = 0.219 ... 


I 


From the list, you can see that the value of X with the highest probability is 2. 


The most likely number of students awarded a distinction in a group of 50 is two. 


Exercise 5c Expectation, variance and mode of the binomial distribution 


1. 10% of the articles from a certain production 
line are defective. A sample of 25 articles is 
taken. Find the expected number of defective 
items and the standard deviation. 


2. The probability that an apple picked at random 
from a sack is bad is 0.15. 


(a) Find the standard deviation of the number 
of bad apples in a sample of 15 apples. 

{b) What is the most likely number of bad 
apples in a sample of 30 apples? 


3. The random variable X is B(#, 0.3) and a 
E(X) = 2.4. Find # and the standard deviation 
of X. 


4. Ina group of people the expected number who 
wear glasses is two and the variance is 1.6. 
Find the probability that 


(a) a person chosen at random from the group 
wears glasses, 
(b) six people in the group wear glasses. 


5. The random variable X is B(10, p) where p < 0.5. 
The variance of X is 1.875. Find 


(a) the value of p, 
(b) E(x), 
(c) P(X =2). 


6. Adie is biased and the probability, p, of 
throwing a six is known to be less than g. An 
experiment consists of recording the number of 
sixes in 25 throws of the die. 


In a large number of experiments the standard 
deviation of the number of sixes is 1.5. 

Calculate the value of p and hence determine, to 

two places of decimals, the probability that 

exactly three sixes are recorded during a 

particular experiment. (C).--— 


|. Ina certain African village, 80% of the villagers 


are known to have a particular eye disorder. 

Twelve people are waiting to see the nurse. 

(a) What is the most likely number to have the 
eye disorder? 

(b) Find the probability that fewer than half 
have the eye disorder. 


. Ina bag there are six red counters, eight yellow 


counters and six green counters. An experiment 

consists of taking a counter at random from the 

bag, noting its colour and then replacing it in the 

bag. This procedure is carried out ten times in 

all. Find 

{a) the expected number of red counters drawn, 

{b) the most likely number of green counters 
drawn, 

{c) the probability that no more than four 
yellow counters are drawn. 


| The random variable X is distributed binomially 


with mean 2 and variance 1.6. Find 


{a} the probability that X is less than 6, 
(b) che most likely value of X. 


10. Seeds are planted in rows of six and after 14 
days the number of seeds which have germinated 
in each of the 100 rows is noted. 

The results are shown in the table: 


Number. of seeds 


germinating On Eee 23, 4 5. 6. 


Number of rows 2 E 2 


Find the theoretical frequencies of 0, 1, ..., 6 
seeds germinating in a row, using the associated 
theoretical binomial distribution. 


11. Each day a bakery delivers the same number of 
loaves to a certain shop which sells, on average, 
98% of them. Assuming that the number of 
loaves sold per day has a binomial distribution 
with a standard deviation of 7, find the number 
of loaves the shop would expect to sell per day. 

(C Additional) 

12. Ina large batch of items from a production line 
the probability that an item is faulty is p. 

400 samples, each of size 5, are taken and the 
number of faulty items in each batch is noted. 


2go1 


(a) Calculate, to two significant figures, the 
probability that, in any one sample, two 
bolts or less will be faulty. 

(b) Find the expected value and the variance of 
the number of bolts in a sample which will 


not be faulty. (L Additional) 


10° 30°°35°20 14, An experiment consists of taking 12 shots at a 


target and counting the number of hits. 
When this experiment was repeated a large 
number of times the mean number of hits was 
found to be 3. Calculate 


{a) the probability of hitting the target with a 
single shot, 

(b) the standard deviation of the number of hits 
in an experiment. (C Additional) 


5. In an experiment a certain number of dice are 
thrown and the number of sixes obtained is 
recorded. The dice are all biased and the 
probability of obtaining a six with each individual 
die is p. In all there were 60 experiments and the 
results are shown in the table. 


From the frequency distribution below estimate p 
and work out the expected frequencies of 0, 1, 2, 
3, 4, 5 faulty items per batch for a theoretical 
binomial distribution having the same mean. 


Number of 


Number of sixes 
obtained: in an 
experiment 0 Wee 2S 4 sh 


Frequency 19 26 AB Bede 


faulty items. 0 1 2 3 4.005 


Frequency 297... 90 10 2 1 0 


13. On average 20% of the bolts produced by a 
machine in a factory are faulty. Samples of ten 
bolts are to be selected at random cach day. 
Each bolt will be selected and replaced in the set 
of bolts which have been produced on that day. 


THE POISSON DISTRIBUTION 


Consider these random variables 


the number of white corpuscles on a slide. 


Calculate the mean and the standard deviation o 
these data. 

By comparing these answers with those expected 
for a binomial distribution, estimate 


(a) the number of dice thrown in each 
experiment, 


(b) the value of p. (C Additional) 


the number of emergency calls received by an ambulance control in an hour, 
the number of vehicles approaching a motorway toll bridge in a five-minute interval, 
the number of flaws in a metre length of material, 


Assuming that each occurs randomly, they are all examples of variables that can be modelled 


using a Poisson distribution. 


Conditions for a Poisson model 


e Events occur singly and at random in a given interval of time or space, 
e A, the mean number of occurrences in the given interval, is known and is finite. 


The variable X is the number of occurrences in the given interval. 
If the above conditions are satisfied, X is said to follow a Poisson distribution, written 
X~ Pola), where 


L 


A % Jn Biss 
P(X=x)=e"—- for x= 0, 1, 2,3, ... to infinity 
x” 


Example 5.18 


‘A student finds that the average number of amoebas in 10 ml of pond water from a particular 
pond is four. Assuming that the number of amoebas follows a Poisson distribution, 


find the probability that in a 10 ml sample 


(a) there are exactly five amocbas, 
(b) there are no amoebas, 
(c) there are fewer than three amoebas. 


Solution 5.18 
X is the number of amoebas in 10 ml of pond water, where X ~ Po(4). 


x 


x 
Using P(X =x) =e" mT with 4=4, 


Ss 


4 
DIX =) ae-t_ 
(a) PX =S)=e 31 


= 0.156 (3 s.f.) 


0 


(b) P(X=0)=e"4 < 


= 0.183 (3 s.f.) 
(c) P(X <3)= P(X =0)+ P(X = 1) + P(X =2) 
4° 4! er 42 


=e - aa 
. 2! 


ees 
a 1 +e 


=e4(1+4+ 8) 
=13e4 
= 0.238 (3 s.£) 


NOTE: 


0 

e PK =O) =e but 4° = 1 and 0! = 1, so P(X =0)=e* 
i: 

) Pixie but 41 =4 and 1!=1, so P(X = 1) =4e* 


These two results are useful in general 
UX ~ Polal, 


then P(X=O0)=e% and PX =1)=A0% 


Unit interval 


Care must be taken to specify the interval being considered. 

In Example 5.18 the mean number of amoebas in 10 ml of pond water from a particular pond 
is four so the number in 10 ml is distributed Po(4). 

Now suppose you want to find a probability relating to the number of amoebas in $ ml of 
water from the same pond. The mean number of amoebas in 5 ml is two, so the number in 

5 ml is distributed Po(2). 

Similarly, the number of amoebas in 1 ml of pond water is distributed Po(0.4). 


Example 5.19 


On average the school photocopier breaks down eight times during the school week (Monday 
to Friday). Assuming that the number of breakdowns can be modelled by a Poisson 
distribution, find the probability that it breaks down 

(a) five times in a given week, 

(b} once on Monday, 

(c) eight times in a fortnight. 


Solution 5.19 


(a) X is the number of breakdowns in a week, where X ~ Po(8). 
3 8° 
P(X=5)= 08 
= 0.0916 (3 s.f.) 
(b) Let Y be the number of breakdowns in a day. 
The mean number of breakdowns in a day is § = 1.6, so Y ~ Po(1.6). 
P(Y=1)= 1.6e71* 
= 0.323 (3 s.f.) 
(c) Let F be the number of breakdowns in a fortnight. 
The mean number of breakdowns in a fortnight is 2 x 8 = 16, so F ~ Po(16). 


168 
ee — e716 
PQF=8)=0% 5 
= 0.0120 (3 s.f.) 


Mean and variance of the Poisson distribution 


The mean number of occurrences in the interval, A, is all that is needed to define the 
distribution completely; 4 is the only parameter of the distribution. 


In a Poisson distribution, it is obvious that the mean, E(X) =A, but it is also the case that 
Var(X) =A. The following should be learnt: 


Hf X ~ Pot} 
E(X)=4 and Var(Xj)=A 


Example 5.20 


X follows a Poisson distribution with standard deviation 1.5. Find P(X > 3). 


Solution 5.20 
If X ~ Po(A) then Var(X) =A. 
But Var(X) = (standard deviation)” = 1.57 = 2.25, 
so A= 2.25 and X ~ Po(2.25). 
P(X >3)=1-P(X <3) 
= 1 — (P(X = 0) + P(X = 1) + P(X =2) 


2.257 
=1- emt +2.25 + | 


2! 


=1-0.6093 ... 
= 0.391 (3 s.f.) 


Using cumulative Poisson probability tables 


If you have access to these tables you may wish to use them to calculate probabilities. The 
tables are printed on page 647. As with the cumulative binomial tables, they give P(X <r) for 
various values A, where X ~ Po(A). 

Here is an extract for Po(1.6). 


° 
S 
TTTTTTITTT 


Example 5.21 


Given that X ~ Po(1.6), use cumulative Poisson probability tables to find, to three decimal 
places, 

(a) P(X <6), 

(b) P(X = 5), 

(c) P(X > 3), 

(d) P(X = 10). 

Find also the smallest integer 2 such that P(X >”) < 0.01. 


Solution 5.21 


Using the table printed above, 
(a) P(X < 6) = 0.9987 = 0.999 (3 d.p.) 
(b) P(X =5)= P(X <5)- P(X <4) 

= 0.9940 — 0.9763 

= 0.018 (3 d.p.) 


(c) P(X >3)=1-P(X<2) 
=1-0.7834 
= 0.217 (3 d.p.) 


(d) X takes the values 0, 4, 2, ..., to infinity, but from the tables, P(X < 8) = 1.0000 to four 
decimal places. This implies that for values of X greater than 8, the probabilities are very 
small, so to three decimal places, P(X = 10} = 0.000. 

10 


In fact, using the formula, P(X = 10) =e"! x 


= 0.000 006 117... 


10! 


If P(X>n)<0.01 
1-P(X<n)<0.01 
P(X <n)>0.99 
P(X < 4) = 0.9763 < 0.99 
P(X < 5} =0.9940 > 0.99 
The smallest integer 1 is 5. 


From tables 


Diagrammatic representation of the Poisson distribution 


Notice that for small values of A, the distribution is very skew, but it becomes more 
symmetrical as A increases. 
e X ~ Po(1) 


p X ~ Po(1.6) 


é X ~ Po(2) 


0.3 


0.2 


OL 


Li 
0123456 * 


X ~ Po(2.2) 


p X ~ Pol3) p X ~ Po(3.8) 


p X ~ Po5) 


0.24 


The mode of the Poisson distribution 


| ' irs 
li, ll | lo 


91011 * 0 2 4 6 8 10 12 14 16 18 20% 


The mode is the value of X that is most likely to occur, i.e. the one with the greatest 
probability. 
From the diagrams, you can see that 
when J = 1, there are two modes, 0 and 1, 
when 4 =2, there are two modes, 1 and 2, 
when 4 = 3, there are two modes, 2 and 3, 
In general, if A is an integer, there are two modes, 2 ~ 1 and A, 
For example, if X ~ Po(8), the modes are 7 and 8. 


Notice also that 


when A = 1.6, the mode is 1, 
when A = 2.2, the mode is 2, 
when / = 3.8, the mode is 3. 


In general, if A is not an integer, the mode is the integer beloy 


For example, if X ~ Po(4.9), the mode is 4. 


Fitting a theoretical distribution to practical data 


‘As with the binomial distribution it is possible to fit a theoretical Poisson distribution to 


experimental data. 


Example 5.22 


T recorded the number of e-mails J received over a period of 150 days with the following 


results: 


Nuniber of e-mails 0 i 2 3 4 


SE 54 36 6 3 


Nuinber of days 


(a) Find the mean number of e-mails per day. tay: ; 
(b) Calculate the frequencies of the Poisson distribution having the same mean. 


Solution 5.22 


{b) Let X be the number of e-mails received in a day. For a Poisson distribution with the same 


mean, use X ~ Po(1.04) and calculate the probabilities of 0, 1, 2, 3, 4, ... e-mails. Multiply 
these by 150 to obtain the theoretical frequencies. 


x P(X= x) Frequency. (nearest integer) 
0 ene = 0.3534... 53 
1 1.04e7104 = 0.3679... 55 
2 = O191L 8: 29 

1.043 
3 BOE Se Te = 0.0662:... 10 

1.044 
4 whO4 = 0.0172.2.: 3 

4! 

>4 T= P(X <4) = 0.000 431... 0 


These compare reasonably well with the original distribution. 


A statistical test to compare the two sets of data, the test, is illustrated on page 573. 


Exercise 5d The Poisson d 


1. An insurance company receives on average two 
claims per week from a particular factory. 
Assuming that the number of claims can be 
modelled by a Poisson distribution, find the 
probability that it receives 


(a) three claims in a given week, 

(b) more than four claims in a given week, 

(c} four claims in a given fortnight, 

(d) no claims on a given day, assuming that the 
factory operates on a five-day week. 


2. A sales manger receives six telephone calls on 


average between 9.30 a.m. and 10,30 a.m. ona 
weekday. Find the probability that 


{a) she will receive two or more calls between 
9.30 a.m. and 10.30 a.m. on Tuesday, 

{b) she will receive exactly two calls between 
9.30 a.m. and 9.40 a.m. on Wednesday, 

(c} during a five-day working week, there will 
be exactly three days on which she receives 
no calls between 10.00 a.m. and 10.10 a.m. 


istribution 


3. The number of bacterial colonies on a petri dish 
can be modelled by a Poisson distribution with 
average number 2.5 per cm?, 

Find the probability that 


(a) in 1 cm? there are no bacterial colonies, 

(b) in 2 cm? there are more than four bacterial 
colonies, 

(c) in 4 cm? there are six bacterial colonies. 


4. Ona particular motorway bridge, breakdowns 
occur at a rate of 3.2 a week. Assuming that the 
number of breakdowns can be modelled by a 
Poisson distribution, find the probability that 


(a) fewer than the mean number of breakdowns 
occur ina particular week, 

{b) more than five breakdowns occur in a given 
fortnight, 

(c) exactly three breakdowns occur in each of 
four successive weeks. 


5. Cars arrive at a petrol station at an average rate 


of 30 per hour, Assuming that the cars arrive at 
random, find the probability that 


(a) no cars arrive during a particular 
five-minute interval, 

(b) more than three cars arrive during a 
five-minute interval, 

(c) more than five cars arrive in a 15-minute 
interval, 

(d) ina half hour period, ten cars arrive, 

{e) fewer than three cars arrive in a ten-minute 
interval. 


6. Flaws occur randomly in a roll of fabric at an 


average rate of 1.5 per metre length. 


(a) Find the probability that in a randomly 
chosen one-metre length there are more than 
two flaws. 

(b) Find the probability that in a randomly 
chosen two-metre length there are no flaws. 

(c) What is the standard deviation of the 
number of flaws in a four-metre length? 


7, The number of calls made to a Health Centre can 


be modelled by a Poisson distribution with 
standard deviation 2. per five-minute interval. 
Find the probability that in a given five-minute 
interval, the number of calls is more than the 
average for a five-minute interval. 


8. The average number of misprints on each page in 


the first draft of a novel is four. Find the 
probability that on a randomly selected double 
page 

(a) there are three misprints on each page 

(b) there are’six misprints in total. 


9, The number of goals scored in a match by 


Random Rovers can be modelled using a Poisson 
distribution. The probability, to three decimal 
places, that the team scores 10 goals is 0.135. 
Given that the mean number of goals scored in a 
match is an integer, find the probability that the 
team scores fewer than three goals in a match. 


10. The number of accidents occurring in a week in a 


certain factory follows a Poisson distribution 
with variance 3.2. Find 


(a) the most likely number of accidents in a 
given week, 

(b) the probability that exactly seven accidents 
happen in a given fortnight. 


12. 


13. 


. For each of the following sets of data, fit a 


theoretical Poisson distribution with the same 
mean. 


@) Co Qn te Be Bee, Bed 
7 uo 30.20 12 74 
(b) pe Oke a AL ee 
pas 4a 20. 8 


A firm investigated the number of employees 
suffering injuries whilst at work. The results 
recorded below were obtained for a 52-week 
period: 


‘Number of employees 


injured in a week Number of weeks 


0 31 


4 or more 


Give reasons why one might expect this 
distribution to approximate to a Poisson 
distribution. Evaluate the mean and variance of 
the data and explain why this gives further 
evidence in favour of a Poisson distribution. 
Using the calculated value of the mean, find the 
theoretical frequences of a Poisson distribution 
for the number of weeks in which 0, 1, 2, 3, 4 or 
more, employees were injured. ( 


Along a stretch of motorway, breakdowns 

require the summoning of the breakdown 

services occur with a frequency of 2.4 per day, 

on average. Assuming that the breakdowns occur 

randomly and that they follow a Poisson 

distribution, find 

(a) the probability that there will be exactly two 
breakdowns on a given day, 

(b) the smallest integer such that the 
probability of more than 1 breakdowns in a 
day is less 0.03. 


USING THE POISSON DISTRIBUTION AS AN APPROXIMATION TO THE 


BINOMIAL DISTRIBUTION 


When # is large 


X ~ Bin, p) 


50) and p is sma. t j 4 
< > nomic 
i I (p< 0.1) bi nial distribution 


can be ap: sroximated using a Poisson distribut: mit 18m 2 >, 
p ze ing a Poisson distributi i j 
s § ution with the sa i 
> samme mean, i.e. X~ Pol tp } 
sn (rp } 


The approximati e 
€ approximation gets better as 2 g 


Example 5.23 


Eggs are ked i 
the eggs Lr Sa eae average 0.7% of the eggs are found to be broken when 
500 eggs, 5 ct to two significant figures, the probability that in a box of 


(a) exactly three are broken, 
(b) at least two are broken. 


Solution 5.23 


is larger and p gets smaller. 


Let X be the number of broken eggs in a box of 500. 


P(egg is broken) = 0.007, so X ~ B(500, 0.007). 


E(X) = np = 500 x 0.007 = 3.5 


Since n > 50 and p < 0.1, use a Poisson approximation, X ~ Po(3.5) 


(a) P(X=3)2e35 2 
3! 


= 0.22 (2 s.f.) 


(b) P(X > 2)= 1 - (P(X = 0) + P(X =1)) 
= 1-(e°5 + 3.5e35) 
= 0.86 (2 s.f.) 


Example 5.24 


A Christmas draw aims to sell 5000 tickets, 50 of which will win a prize 


a syndicate buys ickets. Let X re} ent the number of these tickets that win a prize. 
A dicate b 200 ti pres: 0: Pp. 
(i) Justify the use of the Poisson approximation for the distribution of X 


(ii) Calculate P(X < 3). 


( ) Cc 
b alculate how many tickets should be bo in order fo re to be a 90 robabilit 
y SS. ught r for thei bea %o Pp ili y 


of winning at least one prize. 


Solution 5.24 


P(a ticket wins a prize) = 33¢5= 0.01 


(C) 


{a) Let X be the number of these tickets that win a prize. 


Strictly speakin, ve i Pt > 
g you de not have independent trials, but since 7 is ve large X can be 
di dent 5 but ¥ 
considered to be modelled by a binomial distribution where X : B(200, 0.01) 


E(X) = np = 200 x 0.01 = 2. 
(i) Since n> 50 and p< 0.1, use a Poisson approximation, X ~ Po(2). 


Gi) P(X <3) = P(X =0) + P(X = 1) + P(X = 2) + P(X =3) 


2 23 


= 0.86 (2 s.f.) 


(b) Let X be the number of these tickets that win a prize inn tickets, 


E(X) = np =0.01n 


so X ~ B(n, 0.01) and 


Assuming # > 50 and p< 0.1, use X ~ Po(0.017). 


You want P(X > 1)=0.9 


But P(X > 1)=1-P(X=0) 
={-e7.oln 
So 0.9 =1-¢00™ 
el 4 


Taking logs to base e 
-0.017 = In 0.1 


In 0.1 
nS “70.01 
1= 230.25 «.. 


So the least integer value of m must be 231. 
Check: If m= 230, mp =230x0,01=2.3 and1- e238 =0,8997 ...<0.9 


a 


231 tickets should be bought. 


If n= 231, mp = 231 x 0.01 =2.31 and 1-e*+=0,9007 ... > 0.9 


Note that can be found by trial and improvement methods if logarithms are not used. 


Exercise be 


1. The random variable X is B(100, 0.03). 
Find the fo 


The Poisson approximation to the binomial 


3. On average one in 200 cars breaks down ona 
certain stretch of road per day. Find the 


ies aorta probability that, on a randomly chosen day, 


(i) the binomial distribution 
(ii) a Poisson approximation 


(a) none of a sample of 250 cars break down, 
(b) more than two of a sample of 300 cars 


= P(X = P(X=4). 

(a) P(X=0), (b) P(X=2), (c) Pl ) Peay down 
. The probability that a bolt is defective is 0.2%. 

Bolts are packed in boxes of 500. 4, 


(a) Find the probability that in a randomly 
chosen box, 
(i) there are two defective bolts, _ 
(ii) there are more than three defective 

bolts. 

(b) Two boxes are picked at random from the 
production line. Find the probability that 
one has two defective bolts and the other 
has no defective bolts. ; 

(c) Three boxes are selected at random. Find 
the probability that they contain no 
defective bolts. 


Two dice are thrown 


(a) What is the probability of throwing a 
double six? 


Two dice are thrown a total of 90 times. 


double sixes are thrown? 


{b) What is the probability that at least two 


5S. An aircraft has 116 seats. The airline has found, 
from long experience, that on average 2.5% of 
people who have bought tickets for a flight do 
not arrive for that flight. The airline sells 120 
tickets for a particular flight. 


{a) Calculate, using a suitable approximation, 
the probability that more than 116 people 
arrive for the flight. 

{b} Calculate also the probability that there are 
empty seats on the flight. (C) 


6. Ina large town one person in 80, on average, has 
blood of type X. If 200 blood donors are 
sampled at random, find an approximation to 
the probability that they include at least five 
people with blood type X. 

How many donors must be sampled in order that 
the probability of including at least one donor of 
type X is 90% or more? (AEB) 


7. A lottery has a very large number of tickets, one 


in every 500 of which entitles the purchaser to 
prize. An agent sells 1000 tickets for the lottery. 
Using a Poisson approximation, find, to three 
decimal! places, the probability that the number 
of prize-winning tickets sold by the agent is 


(a) less than three 
(b) more than five. 


Calculate the minimum number of tickets the 
agent must sell to have a 95% chance of selling 
at least one prize-winning ticket. (NEAB) 


then X+¥~ Pol +2) 


8. 


9 


10. 


11. 


A manufacturer has found that 3% of seeds 
produced do not germinate. Using a Poisson 
approximation, find, to two significant figures, 
the probability that in a pack containing 150 
seeds, 


(a) more than four fail to germinate, 
(b} at least 145 germinate 


X is B(250, p). The value of p is such that it is 
valid to apply a Poisson approximation. When 
this is done, it is found that P(X = 0) = 0.0235. 
Find the value of p. 


The probability that I dial a wrong number when 
making a telephone call is 0.015. In a typical 
week I will make 50 telephone calls. Using a 
Poisson approximation to a binomial model find, 
correct to two decimal places, the probability 
that in such a week, 


(a) Idial no wrong numbers, 
(b) I dial more than two wrong numbers, 


Comment on the suitability of the binomial 
model and of the Poisson approximation. — (C) 


A newspaper reports that 8.6% of adults in the 
U.K. painted the outside of their houses. 

A sample of $5 adults in the U.K. was selected. 
Stating any necessary assumptions, show that the 
number in the sample that painted the outsides 
of their own houses can be approximated by a 
Poisson distribution. 
Using this approximation, find the probability 
that fewer than four people in the sample painted 
the outsides of their own houses. (C) 


THE SUM OF INDEPENDENT POISSON VARIABLES 


For independent variabies, X and Y, if X ~ Po(w) and ¥ ~ Po(#), 


Example 5.25 


Two identical racing cars are being tested on a circuit. For each car, the number of mechanical 
breakdowns can be modelled by a Poisson distribution with a mean of one breakdown in 100 
ar is tested for 


laps. If a car breaks down it is attended and continues on the circuit. The first c; 
20 laps and the second car for 40 laps. 


Find the probability that the service team is called out to attend to breakdowns 


(a} once, 
(b) more than twice. 
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Solution 5.25 


Since the average number of breakdowns in 100 laps is one, the average number in 20 laps is 


0.2. and the average number in 40 laps is 0.4. 


Let X be the number of breakdowns of the first car, then X ~ Po{0.2) 


Let Y be the number of breakdowns of the se 
Let T be the total number of breakdowns, 


cond car, then Y ~ Po(0.4} 


Then T=X+Yand T~ Po(0.2 + 0.4), ie. T~ Po(0.6) 


(a) P(T = 1)= 0.60% 
= 0.329 (3 d.p.) 


(b) P(T>2)=1- (P(T=0)+P(P=1) + P(T =2)) 


=1 an +060 | 
7 2! 


= 0.023 (3 d.p) 


pnt eee REIMAN IEEE EN 


Example 5.26 


The centre pages of the Weekly Sentinel consist of a page of film and theatre naar ant a 
page of classified advertisement. The number of misprints in the reviews can Ae ae : ! . 
using a Poisson distribution with mean 2.3 and the number of misprints in the classifie 
section can be modelled by a Poisson distribution with mean 1.7. 


Using cumulative Poisson probability tables, 
(a) the probability that on the centre pages t 


find ; _ 
here will be more than five misprints, 


(b) the smallest integer such that the probability that there are more than m misprints on the 


centre pages is less than 5%. 


Solution 5.26 


Let X be the number of misprints in the reviews, then X ~ Po(2.3) 


Let Y be the number of misprints in the classified advertisements, then Y ~ Po(1.7) 
Let T be the total number of misprints on the centre pages, then T= X + Y and 
T ~ Po(2.3 + 1.7), ie. T~ Po(4). = 4.0 PIX <a) 
The cumulative tables are printed on page 647 and the relevant 0 | 0.0183 
extract is shown here: 0.0916 
2 | 0.2381 
(a) P(T > 5)=1-P(T<S) 3 | 0.4335 
=1~0.7851 1 | 0.6288 
=0.215 3 dp. 5 10, 
P) 6 0.8893 
(b) You need the smallest value of 7 such that 7 0.9489 
8 0.9786 
P(T>n) < 0.05 > | 0.9919 
Le. 1-P(T <n) < 0.05 10 0.9972 
so P(T <n} > 0.95 {1 0.9994 
From the tables, P(T <7) = 0.9489 < 0.95 eee 
P(T < 8) =0,9786 > 0.95 14 1.0000 
The smallest value of 7 is 8. is 


se I EOD I ET 


Exercise 5f Sums of Poisson variables 


1. Telephone calls reach a secretary independently 
and at random, internal ones at a mean rate of 
two in any five-minute period, and external ones 
at a mean rate of one in any five-minute period. 
Calculate the probability that there will be more 
than two calls in any period of two minutes. 

(O &C) 


2. During a weekday, heavy lorries pass a census 
point P on a village high street independently 
and at random times. The mean rate for 
westward travelling lorries is two in any 
30-minutes period, and for eastward travelling 
lorries is three in any 30-minute period. 


Find the probability 


(a) that there will be no lorries passing P in a 
given ten-minute period, 

(b) that at least one Jorry from each direction 
will pass P in a given ten-minute period, 

{c) that there will be exactly four lorries passing 
P ina given 20-minute period. (O&C) 
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3. A large number of screwdrivers from a trial 
production run is inspected. It is found that the 
cellulose acetate handles are defective on 1% and 
that the chrome steel blades are defective on 
1}% of the screwdrivers, the defects occurring 
independently. 


(a) What is the probability that a sample of 80 
contains more than two defective 
screwdrivers? 

(b) What is the probability that a sample of 80 
contains at least one screwdriver with both a 
defective handle and defective blade? 


(O & C) 


4. A restaurant kitchen has two food mixers, A and 


B, The number of times per week that A breaks 
down has a Poisson distribution with mean 0.4, 
while independently the number of times that B 
breaks down in a week has a Poisson distribution 
with mean 0,1. Find, to three decimal places, the 
probability that in the next three weeks 


{a} A will not break down at all, 
(b) each mixer will break down exactly once, 
{c) there will be a total of two breakdowns. (L) 


| 
i 
[ 


Summary 


@ Uniform distribution 


af 
P(X =x,) == for ra xy % 9) Xin 
n 


@ Geometric distribution X ~ Geo(p) 

pis the probability of a successful outcome 

X is the number of independent trials needed to obtain the first successful outcome. 
to infinity, where g= 1-p 


P(X=x)=q*°' xp forx=1, 2,3, +5 


Note that X cannot be zero. 


E(X) = Var(X)=—5, “mode = 1. 


@ Binomial distribution X ~ B(n, p) 
p is the probability of a successful outcome. 
X js the number of successful outcomes in 2 independent trials 
P(X =x) = 
E(X) = np and Var(X) = npq. 


"Co gt-® x p® for x = 0, 1,2, -.45% where g=1-p 
© Poisson distribution X ~ Po(A) 


i iven i i hen the 
X is the number of occurrences of an event in a given interval of time or space, w: 
mean number of occurrences in the given interval is. A. 


P(X=x)=e"4 fe for x = 0, 1,2; 3, -.., to infinity 
x! 
E(X) =A and. Var(X) = Ay 


@ Poisson approximation to the binomial distribution 


If X~ B(n, p) with n> 50 and p < 0.1, then X ~ Pomp) approximately. 


=. Stim of independent Poisson variables 


Jf X ~ Po(m)and X ~ Po(7) then X+.Y~ Po(m +n). 


Miscellaneous worked examples 


| Example 5.27 


Every working day Mr Driver pulls out from his drive on to a main road in such a way that 
there is a very small probability p that his car will be involved in a collision. 


(a) Show that in a five-day week the probability that there will be no collision is (1 - p)°. 
(b) State one assumption that is made in this calculation. 


(c) Using a binomial expansion, show that the probability that there will be at least one 
collision in a five-day week is approximately 5p. 


(d) Given that p = 0.001, use a calculator to find the probability that Mr Driver will avoid a 
collision in 500 working days. (NEAB) 


Solution 5.27 


(a) X is the number of collisions in five days, X ~ B(5, p). 
P(X =0)= 
=(1-p)' whereq=1-p. 

(b) The circumstances remain the same for the five days; the events are independent. 
(c} P(X>1) =1-P(X=0) 

-(1-p) 

-—(1-5p+10p?+---) 

x1~-1+Sp (ignoring higher powers of p since p is small) 
P(X> 1) = Sp 


(d) X is the number of collisions in 500 days, X ~ B(500, 0.001) 
Using the binomial distribution 


P(X = 0) = 0.999 = 0,606 (3 d.p.) 


Alternatively, since 2 > 50 and p < 0.1, using a Poisson approximation with 
mean = np = 0,5, X ~ Po(0.5), so 


P(X =0) =e 5 = 0,606 (3 dip.) 


Example 5.28 


A salesman sells goods by telephone. The probability that any aoeaes call achieves a sale 


is 3, independently of all other calls. The salesman continues to make calls until one call 
achieves a sale. 


{a) Name an appropriate distribution with which to model this situation. 


(b) Calculate the probability that the call that achieves a sale 
(i) is the fifth call made, 
{ii) does not occur in the first five calls. 


(c} Obtain the mean and variance of the number of calls the salesman makes. (C) 


SE 


Solution 5.28 


(a) X is the number of calls until a call achieves a sale. : 
X can be modelled by a geometric distribution, X ~ Geo(7)- 


(b) (i) P(X =S)=4"p 


=(H)'x h 
= 0.059 (2 s.f.) 
(ii) P(X > 5)=¢° 
=p 
= 0.65 (2 s.£.) 
11 
(c) E(X)s—=7=12 
P 12 i 
Var(X) =—5 = p93 = 132 
Ga 


So E(X) = 12 and Var(X) = 132. 


Example 5.29 
‘The number of births announced in the personal column of a local weekly newspaper may be 
modelled by a Poisson distribution with mean 2.4. 


Find the probability that, in a particular week, 


(a) three or fewer births will be announced, 
(b) exactly four births will be announced. 


Solution 5.29 


X is the number of birth announcements in a week, X ~ Po(2.4). 


(a) P(X <3) = P(X = 0) + P(X = 1) + P(X =2) + P(X =3) 
2.42 243 44 
=e%44 2.4074 + Tn e”™ : 


= 0.779 (3 s.f.) 


2.44 
-2.4 
(b) P(X=4)=e4 


= 0.125 (3 s.f.) 
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Example 5.30 


Weak spots occur at random in t 
per 100 metres. 

If X represents the num 
of X. 


he manufacture of a certain cable at an average rate of one 


ber of weak spots in 100 metres of cable, write down the distribution 


Lengths of this cable are wound on to drums. Each drum carries 50 metres of cable. 

Find the probability that a drum will have three or more weak spots. 

A contractor buys five such drums. Find the probability that two have just one weak spot each 
and the other three have none. (AEB) 


Solution 5.30 


X is the number of weak spots in 100 m of cable, X ~ Po(1). 
Let Y be the number of weak spots in 50 m of cable, Y ~ Po(0.5). 


P(Y>3)=1-P(Y<3) 
=1-(P(Y=0)+P(Y=1)+ P(Y=2)) 


0.5? 
=1- [-** +0.5 e 5 Ponte os 


0.57 
=1-e° (140.54 
2! 


= 0.01438 ... 

= 0.014 (2 s.f.) 
P(a drum has one weak spot) = P(Y = 1) = 0.5e° 
P(a drum has no weak spot) = P(Y = 0) = e°° 
In five drums, 


P(2 have one weak spot, 3 have none) 


=5C, x (P(Y=1))? x (P(Y=0))? 
= 10x (0.5 eo)? x (e%5)3 
=10x0.25e'xel 

=2.5 xe? 

=0.21 (2 sf) 


Miscellaneous exercise 5g 


4. The random variable X has the binomial 3. Copies of an advertisement for a course in 


distribution B(10, 0.35). Find P(X < 4). 


The random variable Y has the Poisson 
distribution with mean 3.5. Find P(2 < ¥< 5). (L) 


2. The number of white corpuscles on a slide has a 
Poisson distribution with mean 3.2. 


(a) Find the most likely number of white 

corpuscles on a slide. 

Calculate correct to three decimal places the 

probability of obtaining this number. 

(c) If two such slides are prepared, what is the 
probability, correct to three decimal places, 
of obtaining at least two white corpuscles in 
total on the two slides? 


(b 


practical statistics are sent to Mathematics 
teachers in a large city. For each teacher who 
receives a copy, the probability of subsequently 
attending the course is 0.09. 

Twenty teachers receive a copy of the advertise- 
ment. What is the probability that the number 
who subsequently attend the course will be 


{a) two or fewer, 
{b) exactly four. {AEB} 


4, 


RSE IN ALEVEL S 


(a) Experience shows that a charity receives 
replies to letters at the rate of eight per 100. 
Calculate, giving each answer to two 
decimal places, the probability that the 
number of replies from ten letters is 


(i) 0, (ii) 1, (iii) 2, (iv) more than 2. 


(b) On average, out of every 1000 items made 
by a certain factory worker, one item is 
defective. The items are inspected in batches. 
A large number of batches, each of # items, 
made by the worker is inspected. Evaluate 7 
in each of the following cases. 


(i) The mean number of defective items per 
batch is 0.045. 

(ii) The standard deviation of the number 
of defective items per batch 0.333. 


‘The probability of at least one defective item 
in a batch of N items is greater than 0.02. 
Use this information to write down an 
inequality which is satisfied by N. {C) 


5, Astore sells word processors. The proportion 


which are returned as faulty has been found to 
be 0.035. During the Christmas period of 1995, 
the store sold 104 word processors, The number 
of these which will be returned as faulty is X. 
Assuming independence, state the exact 
distribution of X. 

Give reasons why this distribution can be 
approximated by a Poisson distribution. 
Calculate the probability that at most three of 
the word processors will be returned as faulty. 


{C) 


6. (a) The probability that a seed of a particular 


variety of bean will germinate when sown is 
0.96. 

Seeds are sold in packets of 50. If a packet is 
selected at random, calculate the probability 
that the number of seeds which will 
germinate when sown is exactly 

(i) 50, (ii) 49, (iii) 48. 

If 200 packets of seeds are selected, estimate 
the number of packets from each of which 
fewer than 48 seeds will germinate. 

If three packets of seeds are selected, 
calculate, to three decimal places, the 
probability that at least 149 of the 150 seeds 
will germinate. 

A self-employed worker contacts an agency 
every morning in an attempt to obtain work 
for the day. The probability that work is 
available on any given day is 0.9. Caiculate, 
for a period of 100 working days, the mean 
and the standard deviation of the number of 
days on which work is available. (C) 


= 


A car hire firm has three cars, which it hires out 
on daily basis. The number of cars demanded 
per day follows a Poisson distribution with 
mean 2.1. 


Ne) 


(a) Find the probability that exactly two cars 
are hired out on any one day. : 

(b) Find the probability that all cars are in use 
on any one day. ; 

(c) Find the probability that all cars are in use 
on exactly three days of a five-day week. 

(d) Find the probability that exactly ten cars are 
demanded in a five-day week. Explain 
whether or not such a demand could always 
be met. 

(e) It costs the firm £20 a day to run each car, 
whether it is hired out or not. The daily hire 
charge per car is £50. Find the expected 
daily profit. (MEI) 


A factory produces a particular type of electronic 


component. The probability of a component 
being acceptable is 0.95. The components are 
packed in boxes of 24. 


(a) Calculate the probability that a box, chosen 
at random, contains exactly 22 acceptable 
components. 


All boxes are inspected and a box is rejected if it 
contains fewer than 22 acceptable components. 


(b) Calculate the probability that a box, chosen 
at random, is rejected. 


The factory produces 80 boxes per day over a 
long, period of time. 


(c) Estimate the mean and the standard 
deviation of the number of boxes rejected 
per day. 

It is proposed to introduce an alternative policy 

with regard to packing and inspection, as 

follows: 


The daily production of components is to be 
packed in 160 boxes, each containing 12 
components, and boxes containing fewer than 11 
acceptable components are to be rejected. 


{d) Estimate the mean number of boxes rejected 
per day under this alternative policy. 

(e) Explain whether or not this alternative 
policy would lead to a decrease in the 


expected number of components rejected per 
day. {C) 


A large bin contains 5250 used golf balls, 1260 
of which are unusable. The random variable R 
denotes the number of unusable balls in a 
random sample of ten balls, selected without 
replacement, from the bin. 


(a) Explain why R may be approximated as a 
binomial random variable with parameters 
10 and 0.24. 

{b) Hence calculate the probability that the 
sample contains 
(i) exactly three unusable balls, 
{ii) at most three unusable balls. (NEAB) 


10. The number of night calls to a fire station in a 


11. 


12, 


13. 


14, 


small town can be modelled by a Poisson 
distribution with mean 4.2 per night. Find the 
probability that on a particular night there wiil 
be three or more calls to the fire station. 

State what needs to be assumed about the calls to 
the fire station in order to justify a Poisson 
model. (C) 


A television repair company uses a particular 
spare part at a rate of four per week. 

Assuming that requests for this spare part occur 
at random, find the probability that 


(a) exactly six are used in a particular week, 

(b) at least ten are used in a two-week period, 

(c}) exactly six are used in each of three 
consecutive weeks. 


The manager decides to replenish the stock of 
this spare part to a constant level # at the start of 
each week. 


(d) Find the value of # such that, on average, 
the_stock will be insufficient no more than 
once in a 52-week year. (L) 


In the Growmore Market Garden plants are 
inspected for the presence of the deadly red 
angus leaf bug. The number of bugs per leaf is 
known to follow a Poisson distribution with 
mean one. What is the probability that any one 
leaf on a given plant will have been attacked {at 
least one bug is found on it)? 


A random sample of 12 plants is taken. For each 
plant ten leaves are selected at random and 
inspected for these bugs. If more than eight 
leaves on any particular plant have been attacked 
then the plant is destroyed. What is the 
probability that exactly two of these 12 plants 
are destroyed? (AEB) 


In Blackbury it is known that 0.4% of people 

have blood group AB. 

Blackbury High School has 1000 pupils, with 28 

pupils in class 4T. 

(a) (i) Write down a distribution that could be 
used to model the number of pupils in 
class 4T with blood group AB~. 

(ii} Hence calculate the probability that 
there are exactly two pupils in class 4T 
with blood group AB~. 


(b) Using an appropriate distributional 
approximation, calculate the probability 
that there are fewer than six pupils at the 
school with blood group AB~. 

(c} State an assumption that you have made in 
answering this question. {NEAB) 


The probability that a fisherman has a successful 
day’s fishing is 0.6. Given that he fishes for six 
days every week, find the probability that in any 
week he has 

(a) exactly four successful days, 

(b) at least two successful days. 


45. 


16. 


17. 


The fisherman fishes for six days every week for 
many weeks. Estimate the mean and the standard 
deviation of the number of successful days per 
week over this period (Cc) 


A large number of groups, each consisting of 

12 adults, are selected at random from the 
population of a particular town, Given that 30% 
of the adults in this town are car owners, 
calculate 


(a) the probability that a group contains not 
more than two car owners, 

(b) the mean and the standard deviation of the 
number of car owners in the groups. (C) 


In a large city one person in five is left-handed. 


(a) Find the probability that in a random 
sample of ten people 
(i) exactly three will be left-handed, 
(ii) more than half will be left-handed. 

(b) Find the most likely number of left-handed 
people in a random sample of 12 people. 

(c) Find the mean and the standard deviation of 
the number of left-handed people in a 
random sample of 25 people. 

(d) How large must a random sample be if the 
probability that it contains at least one left- 
handed person is to be greater than 0.95? 


Batches of 400 shells in the First World War 
were classified as ‘accepted’ or ‘rejected’ by 
testing a small number of shells from the batch, 
Tested shells are either ‘good’ or ‘bad’; the 
probability that a randomly selected shell is good 
is p. 
(a) In one testing method, eight shells from a 
atch (of 400) are selected at random and 
tested. The batch is accepted if at least three 
of these eight shells are good. Use a 

inomial distribution, with p = 0.2, to find 

the probability that the batch is accepted. 

(b) Ina second testing method, each batch of 

400 is subdivided into four sub-batches of 

00 shells each. Two shells from each sub- 

batch are tested, and the sub-batch is 

accepted if at least one of the two shells is 
good, Use a binomial distribution, with 
p=0.2, 

i) to show that the probability that one 

particular sub-batch is accepted is 0.36, 

{ii) to find the probability that, out of four 

sub-batches, at least three are accepted. 

{c) Ina third testing method, four shells are 

selected and the batch (of 400) is accepted if 

all four of the shells are good. The 

robability that the batch is accepted is 
0.01. Assuming a binomial distribution, 
find the value of p. 

(d) State one condition which must be satisfied 
by the shells if a binomial model is to be 
valid, and give a reason why it may not be 
satisfied in this context. (C) 
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18. 


19. 


20. 


21. 


22. 


Define the Poisson distribution and state its 
mean and variance. 

The number of telephone calls received at a 
switchboard in any time interval of length T 
minutes has a Poisson distribution with 

mean 4T. The operator leaves the switchboard 
unattended for five minutes. 

Calculate to three decimal places the 
probabilities that there are (a) no calls, (b) four 
or mote calls in her absence. 

Find to three significant figures the maximum 
length of time in seconds for which the operator 
could be absent with a 95% probability of not 
missing a call. (NEAB) 


A shop sells a particular make of radio at a rate 
of four per week on average. The number sold in 
a week has a Poisson distribution. 


(a) Find the probability that the shop sells at 
least two in a week. 

{b) Find the smallest number that can be in 
stock at the beginning of a week in order to 
have at least a 99% chance of being able to 
meet all demands during that week. (L) 


The independent Poisson random variables X 
and Y have means 2 and 5, respectively. Obtain 
the mean and variance of the random variables 


(a) Y-X, 
(b) 2¥ +10. 


For each of these random variables give one 


reason why the distribution is not Poisson. 
(NEAB) 


Fanfold paper for computer printers is made by 
putting perforations every 30 cm in a continuous 
roll of paper. A box of fanfold paper contains 
2000 sheets. State the length of the continuous 
roll from which the box of paper is produced. 
The manufacturers claim that faults occur at 
random and at an average rate of one per 

240 metres of paper. State an appropriate 
distribution for the number of faults per box of 
paper. Find the probability that a box of paper 
has no faults and also the probability that it has 
more than four faults. 
‘Two copies of a report which runs to 100 sheets 
per copy are printed on this sort of paper. Find 
the probability that there are no faults in either 
copy of the report and also the probability that 
just one copy is faulty. (MEI) 


A randomly chosen doctor in general practice 
sees, on average, one case of a broken nose per 
year and each case is independent of other 
similar cases. 


23. 


24, 


{a) Regarding a month as a twelfth part of a 
year, 

(i) show that the probability that, between 
them, three such doctors see no cases of 
a broken nose in a period of one month 
is 0.779, correct to three significant 
figures, 

(ii) find the variance of the number of cases 
seen by three such doctors in a period 
of six months. 

(b) Find the probability that, between them, 
three such doctors see at least three cases in 
one year 

(c) Find the probability that, of three such 
doctors, one sees three cases and the other 
two see no cases in one year. (C) 


Lemons are packed in boxes, each box 
containing 200. It is found that, on average, 
0.45% of the lemons are bad when the boxes are 
opened. Use the Poisson distribution to find the 
probabilities of 0, 1, 2, and more than two bad 
lemons in a box. 

A buyer who is considering buying a 
consignment of several hundred boxes checks the 
quality of the consignment by having a box 
opened. If the box opened contains no bad 
lemons he buys the consignment. If it contains 
more than two bad lemons he refuses to buy, and 
if it contains one or two bad lemons he has 
another box opened and buys the consignment if 
the second box contains fewer than two bad 
lemons, What is the probability that he buys the 
consignment? 

Another buyer checks consignments on a 
different basis. He has one box opened; if that 
box contains more than one bad lemon he asks 
for another to be opened and does not buy if the 
second also contains more than one bad lemon. 
What is the probability that he refuses to buy the 
consignment? 


A hire company has two electric lawnmowers 
which it hires out by the day. The number of 
demands per day for a lawnmower has the form 
of a Poisson distribution with mean 1.50. Ina 
period of 100 working days, how many times do 
you expect 


(a) neither of the lawnmowers to be used, 
{b) some requests for the lawnmowers to have 
to be refused? 


If each lawnmower is to be used an equal 
amount, on how many days in a period of 100 
working days would you expect a particular 
lawnmower not to be in use? (MED 


25. The number of oil tankers arriving at a port 


26. 


between successive high tides has a Poisson 
distribution with mean 2. The depth of the water 
is such that loaded vessels can enter the dock 
area only on the high tide. The port has dock 
space for only three tankers, which are 
discharged and leave the dock area before the 
next tide. Only the first three loaded tankers 
waiting at any high tide go into the dock area; 
any others must await another high tide. 

Starting from an evening high tide after which no 
ships remain waiting their turn, find (to three 
decimal places) the probabilities that after the 
next morning’s high tide 


(a) the three dock berths remain empty, 
({b) the three berths are all filled. 


Find (to two decimal places) the probability that 
no tankers are left waiting outside the dock area 
after the following evening’s high tide. {(NEAB) 


In the manufacture of commercial carpet, small 
faults occur at random in the carpet at an 
average rate of 0.95 per 20 m?. Find the 


probability that in a randomly selected 20 m? 
area of this carpet 


(a) there are no faults, 
(b) there are at most two faults. 


27. 


The ground floor of a new office block has 10 
rooms. Each room has an area of 80 m? and has 
been carpeted using the same commercial carpet 
described above. For any one of these rooms, 


determine the probability that the carpet in the 
room 


{c) contains at least two faults, 
(d) contains exactly three faults, 
{e) contains at most five faults. 


Find the probability that in exactly half of these 
ten rooms the carpets will contain exactly three 
faults, (AEB) 


During each working day in a certain factory a 

number of accidents occur independently 

according to a Poisson distribution with 

mean 0,5, 

Calculate the probability that 

{a) during any one day there are two or more 
accidents, 

(b) during two consecutive days there are 
exactly three accidents altogether, 

Out of 50 consecutive five-day weeks how many 

would you expect to be accident-free? 
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Mixed test 5A 


1. 


S 


A series of 7 experiments is carried out and in 
each experiment the only possible outcomes are 
‘success’ and ‘failure’. The total number of 
successes is denoted by X. State two conditions 
which must be satisfied for the distribution of X 
to be modelled by a binomial distribution. 
Gromit invites 11 friends to a party. For each 
friend, the probability that he or she will accept 
the invitation may be taken to be 3. Use a 
binomial distribution to calculate the probability 
that 

(a) exactly nine, 

(b) fewer than nine, 
of the friends will accept the invitation. 

Give a reason why a binomial distribution might 
not be a good model in this situation. (C) 


The weckly number of detached dwellings sold 
by an estate agent may be modelled by a Poisson 
distribution with mean 2.75 and, independently, 
the weekly number of other dwellings sold may 
be modelled by a Poisson distribution with 
mean 3.25. 

Determine the probability that the estate agent 
sells 


(a) exactly four detached dwellings in a week, 

(b) between ten and 15, inclusive, detached 
dwellings over a four-week period, 

(c) fewer than five dwellings in a week. (NEAB) 


. In one part of the country, one person in 80 has 


blood of Type P. A random sample of 150 blood 
donots is chosen from that part of the country. 
Let X represent the number of donors in the 
sample having blood of Type P. 


(a) State the distribution of X. Find the 
parameter of the Poisson distribution which 
can be used as an approximation. Give a 
reason why a Poisson approximation is 
appropriate. 

{b) Using the Poisson distribution, calculate the 
probability that in the sample of 150 donors 
at least two have blood of Type P. 

(c) A hospital urgently requires blood of 
Type P. How large a random sample of 
donors must be taken in order that the 
probability of finding at least one donor of 
Type P should be 0.99 or more. {MEI) 


A geography student is studying the distribution 
of telephone boxes in a large rural area where 
there is an average of 300 boxes per 500 km? A 
map of part of the area is divided into 50 
squares, each of area 1 km?2, and the student 
‘wishes to model the number of telephone boxes 
per square. 
(a) Suggest a suitable simple model the student 
could use and specify any parameters 
required. 


One of the squares is picked at random. 


(b) Find the probability that this square does 
not contain any telephone boxes. 

{c) Find the probability that this square 
contains at least three telephone boxes. 


‘The student suggests using this model on another 
map of a large city and surrounding villages. 


(d) Comment, giving your reason briefly, on the 
suitability of the model in this situation. (L) 


Acrossword puzzle is published in The Times 
each day of the week, except Sunday. A woman 
is able to complete, on average, eight out of ten 
of the crossword puzzles. 


{a) Find the expected value and the standard 
deviation of the number of completed 
crosswords in a given week. 

(b) Show that the probability that she will 
complete at least five in a given week is 
0.655 (to three significant figures). 

{c) Given that she completes the puzzle on 
Monday, find, to three significant figures, 
the probability that she will complete at 
least four in the rest of the week. 

(d) Find, to three significant figures, the 
probability that, in a period of four weeks, 
she completes four or less in only one of the 


four weeks. (C) 


Mixed test 5B 


1. In practising the high jump a certain athlete has 
five attempts at a particular height. The 
probability that she succeeds at any one attempt 
is p. Find an expression, in terms of p, for the 
probability that she succeeds 


(a) exactly four times, 
(b) exactly two times. 


The probability that she succeeds exactly four 
times is twice the probability that she succeeds 
exactly two times. Find the value of p. {C) 


2. Before starting to play the game ‘Snakes and 


Ladders’ each player throws an ordinary 
unbiased die until a six is obtained. The number 
of throws before a player starts is the random 
variable Y, where Y takes the values 1, 2, 3, .... 


(a) Name the probability distribution of Y, 
stating a necessary assumption. 

(b) Find Var(Y). 

(c) Two people play Snakes and Ladders. 
Calculate the probability that they will each 
need at least five throws before starting. (C) 


3. State, giving your reasons, the distribution which 


you would expect to be appropriate in describing 


(a) the number of heads in ten throws of a 
penny, 

(b) the number of blemishes per square metre of 
sheet metal. 


A building bas an automatic telephone exchange. 
The number X of wrong connections in any one 

day is a Poisson variable with parameter A. Find, 
in terms of 4, the probability that in any one day 
there will be 


(c} exactly three wrong connections, 
(d) three or more wrong connections. 


Evaluate, to three decimal places, these 
probabilities when 4 = 0.5. Find, to three decimal 
places, the largest value of A for the probability 
of one or more wrong connections in any day to 
be at most {. (L) 


4, The number of customers entering a certain 
branch of a bank on a Monday lunchtime may 
be modelled by a Poisson distribution with mean 
2.4 per minute. 


(a) Find the probability that, during a particular 
minute, four or more customers enter the 
branch. 


The probability that a customer, who enters the 
branch, intends to open a new account is 0.002 
and is independent of the intentions of other 
customers. During a particular morning 450 
customers enter the bank. 


(b) Use a suitable approximation to find the 
probability that three or fewer of these 450 
customers intend to open new accounts. 

(AEB) 


5. A process for making plate glass produces small 


bubbles (imperfections) scattered at random in 
the glass, at an average rate of four small bubbles 
per 10 m?. 

Assuming a Poisson model for the number of 
small bubbles, determine, to three decimal 
places, the probability that a piece of glass 

2.2m x 3.0 m will contain 


(a) exactly two small bubbles, 
{b) at least one small bubble, 
(c) at most two small bubbles. 


Show that the probability that five pieces of 
glass, each 2.5 m by 2.0 m, will all be free of 
small bubbles is e~. 

Find, to three decimal places, the probability that 
five pieces of glass, each 2.5 m by 2.0 m, will 
contain a total of at least ten small bubbles. (L) 


Probability distributions Il - 
continuous variables 


In this chapter you will learn 


@ 


Ss 


@ 
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about probability density functions for continuous random variables 
how to find probabilities by calculating areas under curves 


how to find 

— the expectation, E(X) of the continuous random variable, X 
— the expectation of any function of X 

- the variance of X 

— the mode 


about the cumulative distribution function, F(x) 

how to find the median, quartiles and other percentiles, 

how to obtain the probability density function #(x) from the cumulative function F(x) 
about the rectangular (uniform) distribution 


CONTINUOUS RANDOM VARIABLES 


The following are examples of continuous random variables: 


the mass, in grams, of a bag of sugar packaged by a particular machine 
the time taken, in minutes, to perform a task, 

the height, in centimetres, of a five-year-old girl, 

the lifetime, in hours, of a 100-watt light bulb. 


PROBABILITY DENSITY FUNCTION (P.D.F.) 


‘A continuous random variable X is given by its probability density function (p.d.f.), which is 
specified for the range of values for which x is valid. The function can be illustrated by a 
curve, y = f(x). Note that this function cannot be negative throughout the specified range. 


Probabilities are given by the area under the curve. It is sometimes possible to find an area by 
geometry, for example by using formulae for the area of a triangle or a trapezium. Often, 
however, areas need to be calculated using integration. 


Example 6.1 

X is the delay, in hours, of a flight from Chicago, where 

f(x) =0.2-0.02x, O0<x<10 

Find 

(a) the probability that the delay will be less than four hours, 

(b) the probability that the delay will be between two and six hours. 
Solution 6.1 

It is useful to draw a sketch of f(x). 


Note that since f(x) is valid for 0 <x < 10, the delay can be between 0 and 10 hours. 


0 10 x 


(a) The probability that the delay will be less than four h is gi 
ray vaca an four hours is given by the area under the 
Method 1 — using geometry 


In this example it is easy to calculate th i ; 
e area using A = 3 (a+ b)h, the form 
ee g A= 7(at b)h, ula for the area 


a=0.2,b=4 
b=f(4) = 0.2 - 0.02 x 4=0.12 
A=}(a+byb 

=4(0.2+0.12) x4 

= 0.64 


Method 2 - using integration 
4 
PO <X< 4) -| (0.2 - 0.02x)dx 
0 
= [0.2x - 0.01x7]) 
= 0.8 - 0.16 
= 0.64 


The probability that the delay will be less than four hours is 0.64. 


(b) The probability that the delay will be b i is gi 

y will be between two and six h 
under the curve between 2 and 6. Sia a Aiea 
fx 


0.2 R. 


Method 1 — using geometry: 


f(2) =0.2 - 0.02 x2 =0.16 Se 
f(6) = 0.2 - 0.02 x 6 = 0.08 b 
A=}(a+byb 0.16 — 
=4(0.16 + 0.08 A 
: its eon 0.08 


P(2<X <6) =0.48 ae ee Poe 10008 


Method 2 — using integration: 


6 
P(2<X<6) -| (0.2 - 0.02x)dx 
£3 


= [0.2% - 0.01%7]§ 
= 1.2 - 0.36 - (0.4 - 0.04) 
= 0.48 


The probability that the delay will be between two and six hours is 0.48. 


= snanrnatenensiscnicet 


Notice that the total area under the curve gives the total probability. 
In the above example it is easy to check by finding the area of the triangle. 


Area of triangle = 4 base x height fx) 
=4x 100.2 = 
=i 
10 10 
Alternatively, | f(x)dx = (0.2 - 0.02x)dx 
0 0 
2 _ 2]10 
: [ozs 0.01%") lL ——— 


Note that it is not possible to find the probability that the delay is, say, exactly three hours. 
If you try to integrate, you get 


3 
P(X =3)= | f(x)dx = 0 
3 


You can only find the probability that X lies within a particular range. 
It is also not possible to distinguish between 


P(2<X <6), 
P(I2<X <6), 
P(2<X <6), 
P(2<X <6), 


so there is no need to worry about whether the inequality is strict or not. 


Example 6.2 


X is the continuous variable, the mass, in kilograms, of a substance produced per minute in an 
industrial process, where 


fey {ore (0 <x< 6) 


Find the probability that the mass is more than 5 kg. 


0 otherwise 


Solution 6.2 


Note that f(x) is a quadratic function and use this to help to draw the sketch of f(x), noting 
that f(x) = 0 when x = 0 and x =6. 


Since you want the probability that x is more than 5, shade the area between 5 and 6. 


You will need to find this by integrating: 
6 
P(X > 5) -| 


1 
: 56 x(6 —x)dx Pe 


fo) = dx6-0 


=0.074 (3dp.) 
The probability that the mass is more than 5 kg is 0.074 (3 d.p.). 


In general, for a continuous random variable X, with p.d.i. flx) 


(b) forasx,<x,<b 


P(x, <X <x) = | “floldx | | 
J 


1 
| 
a XX b 
Remember that in an experimental approach, the area under the histogram represents 
frequency. In a theoretical approach, the area under the curve y = f(x) represents probability. 


Example 6.3 


A continuous random variable has p.d.f. f(x) = kx” for O<x <4. 


(a) Find the value of the constant k. 
(b) Find P(1 < X <3). 


Solution 6.3 


(a) | fedex =1 
all x 


4 
| kx*dx=1 


o 
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3 


(b) PA<X< a= ats 


3 xP 


= 0,406 25 
=0.41 (2 s.f.) 


Example 6.4 

The continuous random variable X has p.d.f, f(x) where 

k(x +2)? -2<x<0 
f(x) = 44k O<x<1} 

0 otherwise 

(a) Find the value of the constant k. 

(b) Sketch y = f(s). 

(c) Find P(-1< X <1). 

(d) Find P(X > 1). 


Solution 6.4 


(a) To find k, you need to use the result | f(x)dx = 1 


all x 
fix) has been given in two parts, so you will need to calculate two separate integrals, as 
follows: 


0 a 
| h(x + 2)? dx +| Akdx =3 
23) 


k 0 : iy 
= 2" wait =1 
3 ey 0 
4k Z =i 
3 @+ 3 = 
8k=1 
i 
Bee 


PROBABILITY D 


(b) The p.d-f. of f(x) is 

g(x+2)? —2<x<0 
3 0<x< 1} 
0 otherwise 


f= 


(c} P(-1<X <1) is given by the shaded area. 


Tt must be found in two stages: 


0 
P-1<X< 0)-| 


-1 


1 
re (x +2)? dx 


1 
“3 [e+ 2p]? 


ea 
24 
pas 
24 
and P(O < X< 1) =area of rectangle 
1 
3; 
7 1 19 
P(-1<X<1)= == 
( 34" 2 "24 


(d) From the diagram, 


P(X > 1) =area of shaded rectangle 


Exercise 6a Calculating probabilities 


1. The continuous random fix) 3. 


variable X has a p.d.f. 


f(x) where f(x) = kx?, j 

O<x<2, ! 

(a) Find the value of the 
constant k. a 

2 


(b) Find P(X > 1). 
(c) Find P(0.5 <X< 1.5). 


2, The continuous random variable X has p.d.f. 
f(x) where f(x) =k, -2<x <3. 
(a) Sketch y = f(x). 
(b) Find the value of the constant k. 
(c) Find P(-1.6<X<2.1), 


CONTINUGL 
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y 
y=} 
1 4 
3 i 
wie / 
' 
aA 
oe | 
-2 ek 0 ix 
¥ 


as 


-2 1 


Vane oS 
x 


The continuous random variable X has p.d.f. 
f(x) where f(x) =k(4—x)}, 1<x<3. 

(a) Find the value of the constant &. 

(b) Sketch y = f{x). 

(c) Find P(1.2<X<2.4), 


. The continuous random 


variable X has p.d.f. f(x) ™ 
where f(x) = k(x +2), 


i 

O<x<2. 1 
(a) Find the value of the 
constant k. | 


(b) Find P(Q< X <1} and 
hence find P(X > 1). 0 ao® 
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5. The continuous random Ax) 7. The continuous random variable X has p.d.f. 
variable X has p.d.f. f(x) f(x) where 
where f(x) =kx?, O<x<c k bend 


and P(X <=. fix)={kQx-3)  2<x<3 
Find the values of the i 0 otherwise 
constants ¢ and k. 


x 


0 c (a) Find the value of the constant k. 
(b) Sketch y= hd 
6. Acontinuous random variable has p.d.£. f(x) (c} Find P(X <1). 
= . (d) Find P(X > 2.5). 
where f(x) = kx, O<x <4 exe 23) 


(a) Find the value of the constant Rk, 
(b) Sketch y = f(x). 
(c} Find P(1<X<2.5). 


EXPECTATION OF X, E(X) 
For a continuous random variable with p.d.f. Ax), 


EUX) = xflx)dex 


m 
ally 


E(X) is referred to as the mean or expectation of X and is often denoted by wu. 


Example 6.5 
The sketch shows the p.d.f. of X where f(x) = bx?, 0<x <3. 


(a) Find , the mean of X. 
(b) Find P(X <p). 


#x) 


fx) = 5% 


Solution 6.5 
(a) w= E(X) 0 Bi0% 
=| xf(x)dx 
‘all x 


=2.25 
(b) P(X <p) = P(X < 2.25) 


ITY DISTRIBUTIONS If 


If f(x) has a line of symmetry in the specified range, then E(X) can be found directly as in the 
following example. 


Example 6.6 


A continuous random variable X has p.d.f. f(x) where © 


0.25x O<x<2 
f(x) =41-0.25x 2<x<4 
0 otherwise 


Sketch y = f(x) and find E(X). 


Solution 6.6 
Sketch of y = f(x) 


fx) 


From the sketch, you can see that there is 
symmetry about x = 2. 


E(X)=2 


foo = 1 - 0.25x 
v4 


Check by integration: 
E(X) -| xf(x)dx 


all x 


2 4 
-| xx 0.25xdx +| xx(1-0.25x)dx 
0 2 


2 4 
-0.25| dvs (x - 0.25x7)dx 
0 2 
x3 ie x x3 4 
=0.25/~-| +|*-—0.25 = 
3 Fi 2 3 


Example 6.7 


A teacher of young children is thinking of asking her class to guess her height in metres. The 
teacher considers that the height guessed by a randomly selected child can be modelled by the 
random variable H with probability density function 


3 2 
e4b-b?) 0<h<2 
bh) =| 
fb) 0 otherwise 
Using this model, 


(a) find P(H <1), 
(b) show that E(H) = 1.25. 


X with probability density function 


A friend of the teacher suggests that the random variable 


kx? O<x<2 
six) = 0 otherwise 


where k is a pines might be a more suitable model. 


(c) Show that k= 

(d) Find P(X < i 

(e) Find E(X). 

(f) Using your calculations in (a ), (b), (d) and (e), 
variables H or X is likely to be the more appropriate model in this instance. 


state, giving reasons, which of the random 
{L) 


Solution 6.7 


fh) 
) PA <= Ne f(b)db sketoh of f(D) 


2 
(b) Ber =| hf(b)db 


“2 (4h? — b3)db 


3 (e : a 


0 


sketch of g(x) 


CONTINUG 


MS tl 


=1.6 
(f) For H, P(H <1) = 0.3125, so 31% of children guess the teacher’s height to be less than 
1 m (ie. 3 ft 3 in). 
E(H) = 1.25, so the average guess for height of the teacher is 1.25 m (ie. 4 ft 1 in). 
For X, 
P(X <1) =0.062 55, so only 6% of children guess the height to be less than 3 ft 3 in. 
E(X) = 1.6, so the average guess for the height of the teacher is 1.6 m (ie. 5 ft-2 in). 
X is the more appropriate model. 


Exercise 6b Expectation E(X) 


1. Find E(X) for each of the following continuous ; eexc2 
random variables. ) fl=|hxl4-x) 2<x<4 
(a) f@)=i@? +1), O<x<1. 0 otherwise 

fix) fx) 
t 
i is ca 
0% 2 4 * 


(b) f(x)= ies ~x),0<x <2. 
2, The continuous random variable X has p.d.f. 


‘ f(x) where 
i. O<x<1 
1<x<3 
0 2% i he x) 3<x<4 
0 otherwise 


=7h(6—-x), 0<x <6. 
ee (a) Draw a sketch of y = f(x}. 


fo) (b) Find k. 
3 (c) Find E(X). 
3. X is a continuous random variable with p.d.f. 
f(x) =kx?, O< x <4, 
i} 6 * 


Find E(X). 


3 
(d) fe) =ke', 0<x<2. 4. Ina game a wooden block is propelled with a 
fx) stick across a flat deck. On each attempt the 
distance, x metres, reached by the block lies 
1 between 0 and 10 m, and the variation is 
; modelled by the probability density function 
I 


fx) = 0,0012x7(10 — x). 


Calculate the mean distance reached by ¢! 
block. (SMP) 


a 


Te 


i 
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The lifetime X in years of an electric light bulb Example 6.8 


5. The continuous random variable X has the 


probability density function f given by f(x) = kx, 
5 <x <10, f(x) = 0 otherwise. 


(a) Find the value of k. 
(b) Find the expected value of X. 
(c) Find the probability that X > 8. 


The annual income from money invested in a 
Unit Trust Fund is X per cent of the amount 
invested, where X has the above distribution, 
Suppose that you have a sum of money to invest 
and that you are prepared to leave the money 


invested over a period of several years. State, = 

with your reasons, whether you would invest in f(x) _ 3x G€x22 Sketch of y = f(x). 
the Unit Trust Fund or in a Money Bond offering 32 
a guaranteed annual income of 8% on the money 3(6—x) 
invested. (NEAB) f(x)= ry] 2<x<6 

6. The lifetime X in tens of hours of a torch battery f(x) =0 otherwise 


is a random variable with probability density 
function 


30-@-2)) 1<x<3, 
0 otherwise 


f= 


Calculate the mean of X. 
A torch runs on two batteries, both of which 


In particular 


has this distribution. Given that a lamp standard 
is fitted with two such new bulbs and that their 
failures are independent, find the probability that 
neither bulb fails in the first year and the 
probability that exactly one bulb fails within two 
years. (MEI) 


. The mass, X kg, of a particular substance 


produced per hour in a chemical process isa 
continuous random variable whose probability 
density function is given by 


{a) Find the mean mass produced per hour. 

(b) The substance produced is sold at £2 per 
kilogram and the total running cost of the 
process is £1 per hour. Find the expected 
profit per hour and the probability that in 
an hour the profit will exceed £7. (NEAB) 


. A continuous random variable X has the 


‘As in the case of the discrete random variable (see pages 246 and 248), the following results 
also hold when X is continuous; a and 6 are constants: 


1. Elaj=a = E(x? = 

2. ElaX) = aE(X) 7 FE ae, 3 tes 
3. ElaX +b) -aB(X) +6 7 aa 

4, Elg(X) + BX) = Elg(X)) + BX) =79 (284) 


The continuous random variable X has p.d.f. f(x) where f(x) = 4 (x + 3), 0<x<4. 


(a) Find E(X). 

(b) Find E(2X +5). 

(c) Find E(X?). 

(d) Find E(X? + 2X -3). 


Solution 6.8 


(a) 2) sf odie 
fall x 


44 
-| or axe + 3)dx 


Ml 

N 

OQ 
—r 
+| ms 

+ 

R 
laine 
= 


(d) E(X? + 2X - 3) = E(X?) + E(2X) — E(3) 


= aS EE 


(Result 4) 


have to be working for the torch to function. If probability density function f defined by 144 
two new batteries are put in the torch, what is =— | (x? + 3x)dx 
the probability that the torch will function for at f(x)= peed 20 Jo 
least 22 hours, on the assumption that the life- 3 1 [x3 3x2]4 ; 
times of the batteries are independent? (O & C) fle)=c 3<xe4 ee x + = | 
= 20/3 2 
7, Arandom variable X has a probability density f(x)=0 otherwise = 2.266 
function f given by where c is a positive constant. Find ms 23 Q of. ) 
fx) = cx(S~x) O<x<S (a) the value of ¢, 
0 otherwise (b) ~ a of * : (b) E(2X + 5) = E(2X) + 5 (Result 3) 
{c) the value, a, for there to be a probability o! = 2E(X)+5 Resul: 
Oe : 0.85 that a randomly observed value of X (Result 2) 
Show that c= 25 and find the mean of X. ill cesed ‘a: (NEAB) = 2(2.266...)+5 
= 9,533... 
= 9.5 (2 s.f.) 
THE EXPECTATION OF ANY FUNCTION OF X (c) 2x2) =| x? f(x)dx 
all x 
As : . ; 7 : aa 4 
If g(x) is any function of the continuous random variable, X, having p.d.f. f(x), then wl Ae+ Idx 
20 Jo 
if? 3 2 
“al, (x? + 3x*)dx 
1 
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Example 6.9 


The mass, X kg, of a particular substance produced in one hour in a chemical process is 
modelled by a continuous random variable with probability density function gi 


f(x) = 32", 0<x<2, 
f(x)=%(6-x), 2<x< 6, 
f(x) =0, otherwise 


(a) Sketch the graph of f. 

(b) Find P(X <4). 

(c) Find the mean mass produced per hour. 
( 


d) The substance is sold at £100 per kilogram and the running cost of the 
hour. Taking £Y as the profit made in each hour, express Y in terms of X. 


(e) Find the expected value of Y. 


Solution 6.9 


(a) fo 


3 
8 


f= 32 MW = H(6-” 


oe 


3 


aa as 8 -(12-2)) 
13 
16 
2 3 e 2 
(c) 2x) =| x dx+| => (6x —x*)dx 
32 b 


472 3 376 
AR +3 P?-9| 
2) 4 ‘i 32 Ss 


3 
8 
mae 108 -72-{12-= 
8 32 3 
7 
=2- 
8 


(d) ¥=100X - 20 

(e) E(Y) = E(100X - 20) 
= 100E(X) - 20 
= 267} 


So the expected profit is £267.50. 


process is £20 per 


Example 6.10 


The continuous random variable X has p.d.f. f(x) where 
$x O<x<1 
f(x)={$x(2-x) 1<x<2 
0 otherwise 


Find E(X?). 


Solution 6.10 


2 
E(X?) -| x? flx)dx 


all x 


16 26 
-| sade | =x3(2 -x)dx 
7 17 


1 6 2 
-2| dere | (2x3 — x*)\dx 
0 7) 
6 [x4]! 6 [x4 xSP 
ale) peas 
dial ae al 
= 1.328... 
= 1.3 (2 s.f.) 


NOTE: E(X?) is an important value which is needed when calculating the variance of X. 


VARIANCE OF X, VarQ0) 


For a random variable X, 
Var(X) = E(X -p)? where uw = E(X). 
As in the discrete case (see page 249) the formula can be written: 
Var(X) = E(X?) — E(X) 
= EX?) —y? 
If X is a continuous random variable with p.d.f£ f(x), then 


Var(X} = x? fixldx — nv? 


where = xfixjdx 


Jail 


The standard deviation of X is often written as o, so uo = ¥Var(X). 


As in the case of the discrete random variable (see page 250), the following results also hold 


when X is continuous; where a and 6 are constants 


1 Var(X) 
) =a? Var(X) 


Wo ee 


Example 6.11 


The continuous random variable X has p.d.f. f(x) 


Var(X), 
d) o, the standard deviation of X, 
) Var(3X +2). 


Solution 6.11 


(a) E(X) -| ; xf(x)dx 
= I : x?dx 
0 


= 2.666... 


1 
= 5 (4) 
=8 


(c) Var(X) = E(X?) - E*(X) 
= 8 - (2.666 ...)? 
= 0.888... 
= 0.89 (2 s.f.) 


(d) o =VVar(X) 
=V0.888... 
= 0.9428... 
= 0.94 (2 s.f.) 


(e) Var(3X + 2) = 9 Var{X) (using variance result 3) 
= 9(0,888 ...) 
=8 


where f(x) = $x, 0<x <4. Find 


Note that there is no symmetry. 


Example 6.12 


As an experiment a temporary roundabout is installed at the crossroads. The time, X minutes, 
which vehicles have to wait before entering the roundabout has probability density function 


fe) 0.8-0.32x O<x<2.5 
30) = 
0 otherwise 


Find the mean and the standard deviation of X. 


Solution 6.12 
E(X) -| xf(x)dx 


all x 


aS 
-| (0.8x — 0.32x7)dx 
0 


2 392.5 
=|0.8 ~~ 0,32 
2 3 |, 


= 0.833... minutes 
= 50 seconds 


The mean time is 50 seconds 


E(X?) -| x? Foe)dec 


all x 


2s 
-| (0.8x? — 0.32x3)dx 
0 


3 472.5 
=|0.8 0,32 = 
3 4|, 


= 1.041... 
Var(X) = E(X?) — E2(X) 


= 1.041... (-0.833...) 
= 0.347... 


s.d. of X = 0.347... 


= 0.589... minutes 
= 35 seconds (2 s.f.) 
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THE MODE 


The mode is the value of X for which f(x) is greatest in the given range of X. 
To locate the mode it is a good idea to draw a sketch. Sometimes the mode can be deduced 
immediately. 


QO 4 * 
Mode is 4 


For so. 


d 
curve y = f(x) using the fact that, at a maximum point, f'(x) = 0, where f'(x) = ih f(x). 


d 
Note that a maximum point is confirmed if f"(x) < 0, where f"(x) = a f(x). 


Example 6.13 


me probability density functions you will need to determine the maximum point on the 


X has p.d.f. defined by f(x) = (2 +x)(4 -x), for O<x< 4 and is illustrated in the diagram. 


Find the mode. 


3 
a0? + x4 —x) 


Solution 6.13 
x)= hQ+x4-x)=s(8 +2x- x") 
The mode is the value of x at the maximum point. 
Differentiate to find f'(x). 


f(x) = 2 ~ 2x) 
f'()=0 when x = 1 


Differentiate again to find f"(x) 
f'G) = #5 * (2) = a0 » 
so f"(x) <0 for all values of x, indicating that there is a maximum point when x = 1. 


The mode = 1 


Example 6.14 
A random variable X has a probability density function 


fx)=Ax(6-x)? = 0 <x <6 
=0 elsewhere. 


(a) Find the value of the constant A. 
(b) Calculate 
(i) the mean, (ii) the mode, 
(iii) the variance, (iv) the standard deviation of X. 


(AEB) 


Sofution 6.14 


(a) Since X is a random variable, | fx)dx =1 
al x 
1 -| Ax(6 —x)*dx 
0 


6 
~Al (36x ~ 12x? + x\dx y 
0 


y = ae M6 — XI? 


6 
= Alte? tes Fo 
0 


= 108A 
1 
A=—— oo 
108 i cag 


(b) f(x) =qggx(6-x)? OO <x<6 


(i) The mean is E(X) where E(X) -| xf(x)dx 


all x 


6 
E(X)= in| x(6 -x)*dx 
0 


1 6 

a ae eee 
108 I (36x° — 12x47 + x*)dx 
1 56 

= 2 aghyeh 
108 [2s es : 

= 2.4 


(ii) To find the mode, find the value of x for which f(x) is a maximum, 0 <x < 6. 
(xx) = zg (36x — 12x? + x9) 
Differentiating 
Fe) = tg 86 - 24x + 3x2) 
=3)(6-)2-2) 
f'(x)=0 when x =2 and when x =6 
f"(@) = 335 (6x - 24) 
To check maximum or minimum, consider f"(x). 
When x = 2, f"(x) <0 and when x = 6, f"(x) > 0. 


f(x) is maximum when x = 2, so the mode is 2. 
(iii) To find the variance of X, first find E(X?). 
E(X?) -| x? f(x)dx 
all x 


1 6 
“wl, (36x3 — 12x4 +x°)dx 


Var(X) = E(X2) - E2(X) =7.2 - (2.4)? 
=V1.44 


(iv) Standard deviation of X= VVar(X) 
= 1.44 
=1.2 


enero lance NSA TS OEE 


Example 6.15 


The time taken to perform a particular task, ¢ hours, has the probability density function 


10ct? 0 <t< 0.6 
f@)={9c(1-1) 0.6 <4t< 1.0 
0 otherwise, 


where c is a constant. 


(a) Find the value of c and sketch the graph of this distribution. 
(b) Write down the most likely time. 
(c) Find the expected time. 
(d) Determine the probability that the time will be 
(i) more than 48 minutes, 


PROBABILITY DISTRIBUTIONS 1 — CONT 


() EW) -| if(@dt 

all 

: 0.6 1.0 
= 106 | Pdt+ ve| (t—t?)dt 
0 
10. 1.0 2. 311.0 
TI +3 7~5| 

06 2 346 

= 0.225 + 0.366... 


= 0.591... hours 
= 35.5 minutes 


The expected time is 35.5 minutes. 


(d) (i) 48 minutes = 0.8 hours ne 


1.0 
P(T > 0.8) = | (1 -t)dt 
0.8 


2 1.0 
=9¢]t-— 
2 i) 0608 1 ¢ 


The probability that the time will be more 
than 48 minutes is 0.125. a) 


(ii) between 24 and 48 minutes. : 
(ii) 24 minutes = 0.4 hours P04 < T<0.8) 
Solution 6.15 P(0.4 << T<0.8) = 1—-P(T > 0.8) - P(T<0.4) ; 
0.4 i 
(a) 1= ‘e fdt P(T < 0.4)=10e I Pat oof “Ga a! : 
0.6 1.0 10c [4]? 
= 10¢| Pdt+ se | (1-ddt m3. t 

0 0.6 0 

271. 

SK eltesdeo] = 0.1481... 

3 2 Nos P(0.4 <T< 0.8)=1-0.125 - 0.1481... i! 
co rao = 0.727 (3 sf.) 
=1.44c a : : ' 

i een 38 The probability that the time will be between 24 and 48 minutes is 0.727. | 
"144 144 36 ham aitinieh } 
The probability density function is no rcise 6c Standard deviation and variance i 
I i 1-7 fi | 
Ys 0 Kt 0.6 25 ee of ais ' 
f®= 284 -f) 0.6<t<1.0 ),  (b) E(x?), — (c) Var(X), — (d)_ the standard deviation of X. ! 
=49 16 <t< 1. Iti ye 
5 ppliccutte Suet et uae peas - function is zero outside the range(s) stated. Do not forget to look for | 
t : 2y- = | 
te) 0.6 1 NOTE: some of these functions were given in Exercise 6a and you may wish to refer to your previous sketches. i 
(b) From the sketch, ¢= 0.6 gives the maximum value of f(). 1. f(x) =}x? O<x<2 4 pends 
Therefore the mode is 0.6 hours = 36 minutes. ‘4 ie * $ USERS ( (2x - 3) 2€x<3 
The most likely time is 36 minutes. 2G) aG a) 5 PRas3 
4. f(x) =%( +2) O<x<2 7. fod={ 
S. f(x) =4x3 O<x<1 2 O<x<i} 


‘ 
i 
| 
3(x +2)? -2<x<0 | 
H 
[ 


seus riteceammtitnnonancnceciSa 


8. A continuous random variable X has p.d.f. 
fix) =kx?, OS x <4. 
(a) Find the value of k, and sketch y = f(x). 
(b) Find E(X) and Var(X). 
(c) Find P(i<X <2). 


9, A continuous random variable X has p.d.f. f(x) 


where 

koe O<x<1 
f(x) ={RQ-x) 1<x<2 

0 otherwise 
Find 
(a) the value of the constant k, 
(b) E(X), (c) Var(X), 


(d) PG <X< 14), {e) the mode. 


10. The continuous random variable X has p.d.f. 
given by f(x) where 
Bx? O<x<3 
f)=}3 3ex<5 
0 otherwise 
(a) Sketch y = f(x). 
(b) Find E(X). 
(c) Find E(X?). 
(d) Find the standard deviation o of X. 


11. 


12. 


13. 


‘A continuous random variable X has a 
probability density function f given by 


= 1<x<3 
fx) x4—a) x 
f(x)=0 otherwise 

2 
(a) Show that k=. 
In3 
(b) Calculate the mean and the variance of X. 
(NEAB) 


The probability density function of X is given by 


k(ax-x”) O<x<2 


f=} 


x<0, x>2 


where k and a are positive constants. 


Show that @ > 2 and that k=——. 
6a-8 

Given that the mean value of X is 1, calculate the 
values of a and k. 

For these values of a and k sketch the graph of 


the probability density function and find the 
variance of X. (NEAB) 


‘A continuous random variable X has probability 
density function f(x) defined by 
12(x?-x) O< <1 
f@)= 0 otherwise 
Find the mean and standard deviation of X. 


(O&C) 


THE CUMULATIVE DISTRIBUTION FUNCTION, F(x) 


In Chapter 4 (page 253) you met the idea of a cumulative distribution function, F(x), fora 


discrete random variable and in Chapter S ( 


pages 283 and 294) you used cumulative 


probability tables giving F(r) = P(X <1) for binomial and Poisson distributions. 


In the same way, if X is a continuous random variable with p.d.f. f(x), you can find the 


cumulative distribution function F(x). 


For a particular value, ¢, in the range of the function, 


Ri) = P(X < 1)= f fade. 


The lower limit is given as —», but in practice it is the smallest possible value of x in the range 


for which x is valid. 
So if flx) is valid in the range a<~ S b, 


then FQ) | fixjdx 


da 


lower Himait 


f&) 


Remember that F(z) gives the area under the 


curve f(x) up to a particular value t. 
Notice that F(b) = P(X < b) 

b 
-| f(x)dx 


a 


=1 


Fox} 


This is as expected, since the total area under the curve is 1. 


Using F(x) to find P(x, <X <x.) 


The median is the value 50% of the wa istributi 
fo y through the distribution. It splits th 
curve y = f(x) into two halves. If m is the median, then for f(x) defined for a aia si 


floddx = 0.5 


ie. Fist} = 0.5 
For example: 


(a) 


fy 


P(x, < 


P(X < x2) = Flap) 


P(X <x,) = Fle) 


X< Xz) = Fix} 2 #0) 


Finding the median, quartiles and other percentiles 


(b) 


fix) 


Area = 0.5 


BT 
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nd median will coincide. Th Larisedist bad 
e cumulative distribution function can now be written in terms of x as follows: 


Note that if f(x) is symmetrical in the given range, the mean a 


The lower quartile, q,, is the value 25% of the way through the distribution, so 6 
4 i F x<0 YA Sketch of y = F(x) 
[ fix) = dx = 0.25 | F(x) = gies piei4 1 _foy=1 
nM in ry = % 
ie. F(q)) = 0.25 ' 1 x34 ! 
| ! 
\ 0 4 ‘ 


% of the way through the distribution, so 


The upper quartile, q3, is the value 75 
(b) P(0.3 < X < 1.8) = F(1.8) — F(0.3) 


{os 
j fx)doe = 0.75 / aa 
i F(1.8) =—— = 0.2025 
ie. P(gy) = 0.75 S 

F(0.3) Sa 0.005 625 


Similarly for other percentiles, for example 


F(10th percentile) = 0.1 and F(35th percentile) = 0.35. P(0.3 <X< 1,8) = 0.025 — 0.005 625 = 0.197 (3 dip.) 
: . =0. .—p. 


(c) For the median m, F 
Sketch of y = f(x) 


Jn general 

F(nth percentile) = ae F(m) =0.5 

100 ; m4 

ec, —_= 

16 5 

Example 6.16 m =8 

m= 2,828... i” 
| 


The median m = 2.83 (2 d.p.). | 


fx) 
NOTE: take the positive square root, since 0 <m <4. : | 


X is a continuous random variable 
with p.d.f. as shown. 


fix)=tx, O<x<4 ; 
(d) Flg,)=025 ue 0.25 
Find 0 4% qz=4 

2= 
(a) the cumulative distribution function F(x) and sketch y = F(x), a= 
(b) P(0.3< X<1.8), ; | 
(c) the median, m, F(q3) = 0.75 “ cies 0.75 | 
(d) the interquartile range. 16 
q3 = 12 \ 
Solution 6.16 ; q3=V12 = 3.464... | 

nt til S¢/=q.2 
(a) For values of between 0 and 4, erquartile range = 43 ~ q, = 3.464 ...—2 = 1.5 (2 s.f.). 
t 1 I SERRONTSNEEHOTTION 
F(t) -| gd Example 6.17 
0 . . 
= lia} FA) = - = }, as expected. X is a continuous random variable with p.d.f. f(x) where 
a2 = 16 x 
16 FR 3 O<x<2 
16 : fx)= —F+2 2<x<3 7 
0 otherwise 


{a) Find the cumulative distribution fi i i 
icin ution function F(x) and sketch it. 
{c} Find the median, . 


Solution 6.17 


(a) F@ -| f(x)dx 
0 


Since f(x) is given in two parts, F(x) must be found in two stages. 


First consider t where 0 <¢<2. 
t 
x 
=| <d: 
F(@) I 5 xe 
x2} 
6 i 
re 


6 2 
So, for O< x <2, Rx)=— 


NOTE: F(2)=$=3 

Now consider ¢ such that 2 <¢<3. 
"1 Qo 

F(t) = FQ2) + | a + ai 
2 


2 t 
naye| +25] 


Writing the answer as a general formula in terms of x, 

2 Sketch of y = F(x) 

= O<x<2 
6 


x? 
F(x) = ae baer 2<x<3 


1 x23 


(b) P< X <2.5) = F(2.5} — F(1) 


To find F(2.5), use F(x) in the interval 2 <x <3. 


z. 
Fx) =“ 42x -2 


2.57 
Fa.5)=-23 +20.5)-2 
_il 
42 
To find F(1), use F(x) in the interval 0<x <2. 
2 
Ax)=—— 
(x) é 
Fd)= 1 
6 
P(1 < X < 2.5) = F(2.5) -— F(1) 
pac ae 
12 6 
= 0.75 


(c) F(m) = 0.5, where m is the median. 


CONTA 


Since F(2) = 4, the median must be less than 2, so consider F(x) in the range 0 <x <2. 


2 
Therefore F(11) -— 
2 
m 
S —=0.5 
"i 6 
nm? = 
m= 1.73 (2 dip.) 


Exercise 6d Cumulative distribution function 


1. The random variable X has probability density 3) 
function 


fx)=3x?, 0<x<2 
Find 
{a) the cumulative distribution function, F(x) 


and draw a sketch of y = F(x), 
{b) the median, m2. 


2. The random variable X has probability density 
function 
fx)=f(4-x), 1<x<3 
Find 


{a) the cumulative distribution function F(x), 
{b) PLS <X <2). 


The random variable X has probability density 
function 


f(x)=k, 1<x<6 


(a) Find k. 

{b) Find the cumulative distribution function 
Fx). 

{c) Find the 20th percentile. 

({d) Find the interquartile range. 


. The random variable X has probability density 
function 
i O<x<2 
fo)={" 


1Qx-3)  2<x<3 
Find 


(a) the cumulative distribution function F(x), 
(b) the median 72. 


340 


5. 


10. 


The random variable X has cumulative 
distribution function 


0 x<0 
Fax)=l{xt O<x<1 
1 «2 


Find 
(a) P(0.3<X<0.6), 


(b) the median 7, 
(c)_ the value of a such that P(X > a) = 0.4. 


The continuous random variable X has p.d.f. 
f(x) =4, 0 <x <3. Find 
(a) E(X), b) Var(X), 

(c) F(x) and sketch y = F(x), (d) P(X 2 1.8), 
(e) P(L.1<X<1,7). 


X js the continuous random variable with p.d.f. 
fix) = kx’, 1x <2, Find (a) the constant & and 
sketch y = f(x), (b) the standard deviation o, 
(c) the cumulative distribution unction F(x), 
(d) the median, 7. 


The continuous random variable X has 
probability density function f given by 


k(4—-x7)  forO<x<2 
fe) = 0 otherwise 


where & is a constant. Show that k= 7 and find 
the values of E(X) and Var(X). 

Find the cumulative distribution function of X, 
and verify by calculation that the median value 
of X is between 0.69 and 0.70 

Find also P(0.69 <X < 0.70), giving your answer 
correct to one significant figure. (C) 


. The continuous random variable X has 


continuous p.d.f. f(s) where 


x 2 
ary 2<x<3 
3.3 

fx) =) 3<x<5 
2- Bx S<x<6 
0 otherwise 


Find (a) a and 8, (b) F(x) and sketch y= F(x), 
(c) P(2<X < 3.5), (d) PX> 5.5). 


The continuous random variable X has 
probability density function 


1+x 
ee 1<x<3 


fix)=} 6 


0 otherwise 


(a) Sketch the probability density function of X. 

(b) Calculate the mean of X. 

{c) Specify fully the cumulative distribution 
function of X. 

(d) Find m such that P(X <m) =t, (L) 


= Bes 


12. 


13. 


14. 


A factory is supplied with flour at the beginning 
of each week. The weekly demand, X thousand 
tonnes, for flour from this factory is a 
continuous random variable having the 
probability density function 


fa=k-x)4 O<x<i 
f(x) =0, elsewhere 
Find ; 
{a)_ the value of k, 


(b) the mean value of X, 
{c) the variance of X, to three decimal places. 


Sketch the probability density function. 

Find, to the nearest tonne, the quantity of flour 
that the factory should have in stock at the 
beginning of a week in order that there is a 
probability of 0.98 that the demand in that week 
will be met. (L) 


Accontinuous random variable X has probability 
density function, f, defined by 


f(x)=4. O<xK<1 
~ 

f= e 1<x<2 

f(x) =0, otherwise 


Obtain the distribution function and hence, or 
otherwise, find, to three decimal places, the 
median and the interquartile range of the 
distribution (L) 


The continuous random variable X has 
probability density function f given by 


fs) k(x +3), -3<€x<3 
~ lo, otherwise 

where k is a constant. 

(a) Show that k =45. 

(b) Find E(X) and Var(X). 

{c) Find the lower quartile of X, i.e. the value g 
such that P(X <q) =4- 

(d) Let Y=aX+6, where a and b are constants 
with a > 0. Find the values of a and 6 for 
which E(Y) = 0 and Var(Y) = 1. (C) 


‘The continuous random variable, X, has 
probability density function defined by 


kx, 0<x<8 
f(x) ={8k, 8<x<9 
0 otherwise 


where k is a constant. 


(a) Sketch the graph of f(x). 

(b} Show that & = 0.025. 

(c} Determine, for all x, the distribution 
function F(x}. 

(d) Calculate the probability that an observed 
value of X exceeds 6. (NEAB) 


15, A continuous random variable, X, has 
probability density function given by 


16. The continuous random variable X has 
probability density function given by 


fx) = ax — bx? for O<x<2 k 
=0 elsewhere fe) ={x Cre 
Observations on X indicate that the mean is 1. 0 otherwise 


(a) Obtain two simultaneous equations for a 


and 6, show that a=1.5 and find the value where k is a constant. Giving your answers 


arb. correct to three significant figures where 
ie Uae the variance of X. appropriate, find : 
c) If F(x) is the probability that X <x find a) th i 
EF ae oe ty x find F(x) (a) se oe of k, and also the median value 
(d) Iftwo independent observations are made (b) the mean and variance of X. 
on X what is the probability that at least (c}_ the cumulative distribution function, F, of 
1» B, 


one of them is less than 4? X, and sketch the graph of y = F(x). (C) 


OBTAINING THE P.D.F., f(x), FROM THE CUMULA 
Ten aM TIV 
FUNCTION F(x) E DISTRIBUTION 


Since F - be obtained by integrating f, it follows that f can be obtained by differentiating F 
fix) = <. Fl) 
dx 
= F(x) 


NOTE: the gradient of the F(x) curve gives the value of f(x). 


Example 6.18 
The continuous random variable X has cumulative distribution function F(x) where 
0 x <0 Fix) 
Fa) =| 
x) = {— 
) 7 0<x<3 
1 x23 


0 3 s 


(a) State the range of values for which the probability densi i is vali 
(b) Find f(x) and illustrate it in a sketch. : Et ee 


Solution 6.18 


{a) Since F(x) is hanging i : . 
x<0 oe a. anging in the regions x <0 and x > 3 it follows that /(x) must be zero for 


So f(x) is valid for 0 <x <3 and f(x} = 0 otherwise. 
d 
(b) f(x) = ie F(x) 
fe 
© dx \27 
_ 3x? 


S27 


| 
| 


The p.d.f. for X is f(x) where 


2 
x 
faa 0<x<3 


f(x)=49 


0 otherwise 


Example 6.19 


By. Ye olathe . see, 
The continuous random variable X has cumulative distribution function F(x) as shown in th 


sketch. 
Fo) 0 x<-2 
‘ SS (2+x) —2<x<0 
F(x) =4 4(1 +x) O<x<4 
4(6+x) 4<x<6 
1 x26 
aay 


2 i) 2 4 6 - 


(a) Find the p.d.f. of X, f(x), and sketch y = fix). 
(b) Find E(X). 


Solution 6.19 ; / 
(a) Since F(x) is unchanging for x <-2 and x > 6, it follows that f(x) must be zero for x < - 
and x 2 6. 


d 
Since f(x)= oF, F(x), 


di i 
for -2<x<0, fe)=F 4g Atay 
di fe ea 
for O<x<4, f@)=7 56 x. 6 
di - 
for 4€x<6, [eas gy Ot 
The sketch of y = f(x) is shown: 
y 
y=] 
a4 yee 
H 1 ' 
{ 4 a 
5 0 4 6 * 


(b) Since f(x) is symmetrical, E(X) = 2. 


Example 6.20 


The continuous random variable X has cumulative distribution function given by 


0 x<0 
F(x) = 42% — x? O<x<1 
1 x>1 


(a) Show that P(X < })=2. 
(b) Find the interquartile range of X. 


Solution 6.20 
(a) P(X < )=FG)=2x}-()°=0.75 


(b) To find the interquartile range, you need to find the upper quartile and lower quartile. 


Upper quartile g, is such that F(q,) = 0.75. 
From (a) F(4)=0.75 
93> 
Lower quartile q, is such that F(g,) = 0.25 
Fq)=2q,-47 
241-47 = 0.25 
q¢-2q,+0.25=0 
(q,-1)?-14+0.25 =0 
(41-1)? =0.75 
9, -1= £0.75 
So q,=1+40.75 or q, =1-V0.75 


Since F(x) is unchanging for x > 1, f(x) = 0 forx> 1. 
So 1+V0.75 is outside the range of f(x). 


4, =1-V0.75 = 0.1339... 
Interquartile range = q,-q, 
= 0.5 ~ 0.1339... 
= 0.37 (2 s.f.) 


Exercise 6e Obtaining f(x) from F(x) 


1. The cumulative distribution function of X is Fld 
given by 1 
0 x<2 
F(x) = {0,25x-0.5 2<x<6 


1 


x26 


(a) Find the probability density function f(x). 
(b) Sketch y = f(x). 
(c) Find E(X). 


(d} Find the interquartile range. 


2, The cumulative distribution function of X is 
given by 


0 x<0 
Fx) = |x? O<x<l 
1 x2 


Find 
(a) the median, 
(b) the mean. 
3, The cumulative distribution function of X is 
given by 


0 x<0 
F(x)=|x—kx® = OS x2 
1 x22 


Find 

(a) the value of k, 

(b)_ the probability density function f(x); 
(c) the median of X, 

(d) the variance of X. 


4, ‘The continuous random variable X has 
cumulative distribution function F(x) where 


0 x<0 

2x 

= O<x<l 
FQ)=4 5 

=+k 1<x<2 

3 

1 xP2 
Find 


(a) the value of k, 

(b) the p.d.f. f(x) and sketch it, 
(c) the mean 4, 

(d) the standard deviation o. 


5. The continuous random variable X has 


cumulative distribution function F(x) where 


0 x<l 
—1)? 
&% 1<x<3 
Hal 44x —x?- 25 
G4e—27=25) 3 cxe7 
24 
1 x27 
Find. 
(a) the p.d.f. f(x) and sketch it, 
(b) E(X) 


(c) Var(X), 
(d) the median of X, 


_ The continuous random variable X has 


(cumulative) distribution function given by 


1+x 


— -1<x<0 
3 x 
1+3 
F(x) = pu O<x<2 
S+x 
2<x<3 
ry x 


where F(x) = 0 for x<—1, and P(x) = 1 for x > 3. 


(a) Sketch the graph of the probability density 
function f(x). 

(b) Determine the expectation of X and the 
variance of X. 

(c) Determine P(3 < 2X <5). (C) 


. Acontinuous random variable X takes values in 


the interval 0 to 3. It is given that 
P(X>x)=at bx’, 0<x<3. 


(a) Find the values of the constants a and b. 

(b) Find the cumulative distribution function 
F(x}. 

(c) Find the probability density function f(x). 

(d) Show that E(X) = 2.25. 


. The length X of an offcut of wooden planking is 


a random variable which can take any value up 
to 0.5 m. It is known that the probability of the 
length being not more than x metres (0 <x <0.5) 
is equal to kx. Determine 


(a) the value of k, 

(b) the probability density function of X, 

(c} the expected value of X, 

(d) the standard deviation of X (correct to three 
significant figures). (C) 
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THE CONTINUOUS UNIFORM (OR RECTANGULAR) DISTRIBUTION 


Consider the continuou: di i i ili 
Pap ae s random variable X with probability density function 


Since the total area under the curve is 1 
> 


Sk=1 
k=0,2 
f(x)=0.2, 1<x<6 


i 0 1 6 * 


X is said to follow a continuous uniform, or rectangular, distributi wi 
| : ( ‘ > g > ibution between 1 and 6. 


In general: 


The probabilit density function for a contin: jom ble, d ., 

y F on for ‘ 

| ) inuous random variable, distri iform i 

| ao eae ap ¥ , distributed unifors aly WW 


{ 
(x) =< 
fix) ; 


~ a 


This ts written X ~ R(a, b) 


a and b are known as the parameters of the distribution 


0 a db * 
NOTE: It is easy to see from the diagram that the total area is 1. 
| 1 
Area =(b-a) x 
; (b-a) 
=1 

Example 6.21 

X is distributed uniformly, where 6 <x <9. 

Find P(7.2< X < 8.4). 
Solution 6.21 

i F(x) f(x) = i = i _i 

b-a 9-6 3 
P(7.2 <X< 8.4)=4(8.4-7.2) 


Se 
SS 


Example 6.22 


The lengths of metal rods are measured to the nearest 5 mm. What is the distribution of the 
random variable E, the rounding error made when measuring? Give its probability density 
function f(e). 


Solution 6.22 


The error is the difference between the true length and the recorded length after rounding to 


the nearest 5 mm. 
Suppose you have recorded a length to be 75 mm, to the nearest 5 mm. The true length could 


have been any length in the interval 
72.5mm <1 < 77,5 mm 
So the error, E, could be anywhere in the interval -2.5<E<2.5. 


All points in this interval are equally likely ‘stopping places’ for E, so E is uniformly 
distributed in the interval, i.e. 


F.~ R(-2.5, 2.5) 


1 
fO=F 5-25) 
“5 “2,5 <e<2.5 


Example 6.23 


Rosie spins a ‘Spinning Jenny’ at a fair, When the wheel stops, the shorter distance of an 
arrow measured along the circumference from Rosie is denoted by C. What is the distribution 
of C? 


Solution 6.23 


All the points on the circumference are equally likely Arrow 
stopping places for the arrow, so C is uniformly 
distributed between 0 (when the arrow is next to Rosie) i 


and ar (when the arrow is diametrically opposite Rosie). 


So C~R(O, ar) 
fic) 


Rosie 


1 
=—, O<c<ar. 


Example 6.24 
The error, in grams, made b y 
error, ; y a greengrocer’s scal i 
vo ein eee g) scales may be modelled by the random variable, 
0.1 -3< 
f= eS 
0 otherwise. 
Find the probability that 
(a) an error is positive, 
(b) the magnitude of an error exceeds 2 grams (i.e. | X | > 2) 
(c) the magnitude of an error is less than 4 grams (i.e. |X| < 4). (AEB) 


Solution 6.24 


P(X >0)=7x0.1=0.7 


P(|X|>2)=1-P(|X|<2) 


=1-P(-2<X<2) 
=1-4x01 
= 0.6 


P(|X|<4)=P(-4<X <4) 
Since f(x) = 0 when x < ~3, find P(-3 < X <4). 
P(-3<X<4)=7x01 
=0.7 
So P(|X|<4)=0.7 


EXPECTATION AND VARIANCE OF THE UNIFORM DISTRIBUTION 
Example 6.25 


The continuous random variable Y has a rectangular distribution 
2 <y< 
fain 2°72 
0 otherwise 
(a) Find the mean of Y. 
(b) Find the variance of Y. 


Solution 6.25 


Sketch f(y). 
(a) eee 
m By symmetry 
fe ameal' 
E(Y)=0 
i \ 
\ ! The mean of Y is 0. 
, 0 a 
2 2 


(b) To find Var(Y), find E(Y?) first 


BY) -| y? flyddy Var(¥) = E(¥?) - B°(Y) 


all y a 
7 | ye 2 dy “ag 
4 8 x 
th ‘i D 
™|3 | 5 The variance of Y is ie 
1 {x3 wm 12 
“Het 
1 fx? 
+ 
‘gi 
“12 


It is possible to write the mean and the variance of a uniform distribution in general formulae. 


If the continuous variable X is uniformly distributed over the interval (a, b), then 
X~ RG, b) y 

By symmetry b-a 
E(X) = a+b) 


It can also be shown that — = 
Var(X) = (6 = a) 2 +0) 
THE CUMULATIVE DISTRIBUTION FUNCTION, F(x), FOR A 
UNIFORM DISTRIBUTION 


Example 6.26 
X has probability density function f(x) = $ S<x <9, Find F(x). 


Solution 6.26 
By integration: 
IfS<t<9 


i 
ss 
ale 

R 
ai 
bs = 


t 5 
“4.4 
t-S 
ar 
0 x<5 
x-S 
i So F(x) ={—— S<x<9 
| 4 
1 x29 


CONTINUOUS VAR 


Diagrammatically: 


fa) 
1 fo) = 5 


FO=(-5)xq 


t-S§ 
4 


F(x) 
| ae 
+> 
0 a b z 


Exercise 6f Uniform distribution 


1. X follows a uniform distribution with 
probability density function 


fix)=k,  3< x <6. 
Find 


(a) k, 

{b) E(x), 
(c} Var(X), 
(d) P(X>5). 


F(x) can be illustrated diagrammatically. 


2. X is distributed uniformly, -5 <x <—2. 


Find 

fa) P(-4.3<X<-2.8), 

(b} E(X), 

{c) the standard deviation of X. 
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3. The continuous random variable X has p.d.f. 6. X has cumulative distribution function 
f(x) as shown in the diagram: Cre) 
: FQa)=——) 2<x<7 Summary 
. Find FE P 
: y= fa) FOX), @_ For a continuous random variable X, with p.d.f. f(x) fora<x <b 
{ | (b) Var(X). 
i \ f(x)dx=1 
0 a 7. The continuous random variable X is uniformly all x 
o 1 k distributed in the interval a<x <6. ; | a 
Fadl hes lower quartile is § and the upper quartile @ P(e<X<d) -| f@x)dx. wherea<c<d<b. 
{a) the value of k, Find $ 
(b) P(.1<X<3.4), : 
(c) E(X), (a) the values of a and b, @ Expectation E(X) = xf(x)dx 
(d) Var(X). (b) P(6<X<7), ! all x. 
(c) the cumulative distribution function F(x}. . 
4. The random variable X has p.d.f. f(x) as shown od ; @ Variance, Var(X) -| x? fGddx = E(X) 
in the diagram. 8. x has cumulative distribution function F(x) i all 
illustrated as follows i e The cumulative distribution function F(x) 


ro=| f(x)dx fora<t<b, 


e. To obtain f(x) from F(x), differentiate F(x) 


0 0.5 3 


d 
fx) = Fx) = F'@). 


Tf two independent observations of X are made, 


find the probability that onc is less than 1.5 and Find the probability density functi 2 } : : é 
the other is greater than the mean. Ha a ra a pecans pepe a fee) @ ee quartiles and other percentiles 
Find the int il z edian m: = 

5, ‘The random variable Y has probability density a Find ihe soi beeentlle : F(m) = 0.5 

function given by Lower quartile q,: F(q,) = 0.25 

fo) = ‘s 32 <y< 37 Upper quartile q3: F(q3) = 0.75 

otherwise 
Find the probability that Y lies within one nth percentile F(nth percentile) = aaa 


standard deviation of the mean. 
Interquartile range = q3 - q, 


@ The continuous uniform (rectangular) distribution 


1 
If he=rss 4<x< b, then X ~ R(a, b) “ 


E(X) =} (a+b) 
Var(X) = 4 (b= a)” 


iS) 


RO 


Q 


F(x) = 


x<a 
a<x<eb 
-a@ 
x21 


La 
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Miscellaneous worked examples 


Example 6.27 
‘The random variable X has probability density function 
3x* O<x< i, 
f)= 0 otherwise, 
where k is a positive integer. 
Find 
(a) the value of k, 


(b) the mean of X, 
(c) the value, x, such that P(X <x) = 0.5. 


Solution 6.27 
(a) Since X is a random variable, | f(x)dx =1 


all x 


1 
Therefore | 3xk dx =1 
0 


hth 1 
3|——_| =1 
eal, 

3 


——=1 
k+1 
k+1=3 
kel 
3x? O<x< I 
pels ( otherwise 
1 
(b) E(X) -| xf(x)dx 
-| 3x3 dx 
i} 
1 
“ 
4 0 
=0.75 


The mean of X is 0.75. 
(c) Let P(X <x,)=0.5 


Therefore | 3x72 dx =0.5 
0 


[="]o = 0.5 
xj=0.5 
co ae (0.5)3 
= 0.794 (3 d.p.) 


So x = 0.794 (3 d.p.) 
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Example 6.28 


The length of blades of grass mown from a lawn are modelled by a uniform distribution 
between 1 cm and 5 cm. 


(a) Find the standard deviation of this distribution. 

(b) Find the percentage of blades of grass whose lengths lie within one standard deviation of 
the mean length. 

(c) A better model may be a triangular distribution as shown. 


xy 


ae : 7 
Find the value of C. (NEAB) 
Solution 6.28 
(a) X is the length, in centimetres, of a blade of grass. hal 
f(x)=4, 1<x<s. (see page 345) —— 


Var(X) = 4 (5-1)? = 1 =4 (see page 348) | f 
Standard deviation of X= Vi= 1.15 (2 d.p.) 


(b) EQ) == 


PB-V§<X< 34 VO =2V¥xh 
= 0.577... ‘ 


x 


3 5 
3-8 348 
So approximately 58% of blades of grass have length within one standard deviation of the 
mean. 


Total area = 1 
Area of rectangle = 4 x $=} 
Area of triangle=4x4xh=2h 


(c 


Example 6.29 


On any day, the amount of time, measured in hours, that Mr Goggle spends watching 


television is a continuous random variable T, with cumulative 


0 t<0 
F)=|1-kAS-9? O<t< 15 
1 t215 

where k is a constant. 


(a) Show that k = 335 and find P(S < T<10). 


distribution function given by 


(b) Show that, for 0 <¢< 15, the probability density function of T is given by 


f= t— st. 
(c) Find the median of T. 


Solution 6.29 


(a) When ¢=0, F(t) =0 
Using F(t) =1- (15 - 0), 


when t=0 
O=1-kx15? 
k= 2s 

P(S <T'< 10) = F(10) — F(S) 


= 1-5 (15 - 10)? - (1- As (15 - 5)’) 
-1-#-0-BD 
1 
(b) For #<0 or t2 15, f(t)=0. 
For 0<#< 15, f(t) = F(t) 
f@) =-735(15-) x(-D 
= 33(15-2) 
2 223 
{5 ~ 228 
(c) Let the median of T be m so FQm)= 0.5 
{gh (15 -m)*= 0.5 
(15 —m)?=112.5 
1§—-m= £V112.5 
m=15-—V112.5 or m=15+V112.5 
Since f(#) is valid only for 0 <¢< 15, 
m=15-V112.5 
= 4,393... 
Median = 4.4 (2 s.f.) 


(C) 


Miscellaneous exercise 6g 


een AE TT 


ll 


1. A continuous random variable X has a 
probability density function, f, defined by 


fx)=4x 0<x<2, 
f(x)=0 otherwise. 


Find the expected value of 


{a) X, 
(b) 2X+4 (NEAB) 


2. (a) Accontinuous variable X is distributed at 
random between the values 2 and 3 and has 


6 
a probability density function of —;. 
x 


Find the median value of X. 

(b) A continuous random variable X takes 
values between 0 and 1, with a probability 
density function of Ax(1 — x)*. Find the 
value of A, and the mean and standard 
deviation of X, 


3. Acontinuous random variable X has probability 
density function f(x) given by f(x) = 0 for « <0 
and x > 3 and between x = 0 and x = 3 its form is 
as shown in the graph. 


fx) 


(a) Find the value of A. 

(b) Express f(x) algebraically and obtain the 
mean and variance of X. 

{c) Find the median value of X. 


A sample X,, X, and X; is obtained. What is the 
probability that at least one is greater than the 
median value? 


4, The number of kilograms of metal extracted 
from 10 kg of ore from a certain mine is a 
continuous random variable X with probability 
density function f(x), where f(x) = cx(2.~ x)? if 
0<x<2 and f(x) = 0 otherwise, where c is a 
constant. 

Show that c = 0.75, and find the mean and 
variance of X. 

The cost of extracting the metal from 10 kg of 
ore is £10x. Find the expected cost of extracting 
the metal from 10 kg of ore. (MEI) 


5, The continuous random variable X has 
probability density function f(x} defined by 
c 


a (x <-1) 
f(x) ={ (2 -x*) (1 <x< 1) 
5 («> 1) 


(a) Show that c=4. 

(b) Sketch the graph of f(x). 

(c) Determine the cumulative distribution 
function F(x). 

({d) Determine the expected value of X and the 
variance of X. (C) 


6. Acontinuous variable X is distributed at random 
between the values x = 0 and x = 2, and has a 
probability density function of ax? + bx. The 
mean is 1.25, 


(a) Show that b =}, and find the value of a. 

(b) Find the variance of X. 

(c) Verify that the median value of X is 
approximately 1.3. 

({d) Find the mode. 


7. The continuous random variable X has 
probability density function given by 


ox? O0<x<2 
f(x) = 42c(4 - x) 2<x<4 
0 otherwise 


where c is a constant. 


{a) Show that c= 0.19. 

(b) Find the mean of X. 

(c) Find the lower quartile of X. 

(d) Find the probability that a single 
observation of X lies between the lower 
quartile and the mean. 

(e)  Three'independent observations of X are 
taken. Find the probability that one of the 
observations is greater than the mean and 
the other two are less than the median value 
of X. (C) 


8. The total number of radio taxi calfs received at a 
control centre in a month is modelled by a 
random variable X (in tens of thousands of calls) 
having the probability density function 


cx, O<x<1 
f(x) ={c(2—x) d<xn<2 
0, otherwise 


f(x) 


0 
0 1 ok 
(a) Show that the value of cis 1. 
(b) Write down the probability that X <1. 
(c) Show that the cumulative distribution 
function of X is 


0 x<0 
1x2 O<x<t 
Box) = 47 
eye det-1 Lewd 
1 x22 


(d) Find the probability that the control centre 
receives between 8000 and 12 000 calls in a 
month. 

A colleague criticises the model on the grounds 

that the number of radio calls must be discrete, 

while the model used for X is continuous. 

(e) State briefly whether you consider that it 
was reasonable to use this model for X. 

(f) Give two reasons why the probability 
density function in the diagram might be 
unsuitable as a model. 

(g) Sketch the shape of a more suitable 
probability density function. (L) 


. The lifetime, in tens of hours, of a certain delicate 
electrical component is modelled by the random 
variable X with probability density function 


fx) = ee O<x<9 


otherwise 


where & is a positive constant. 


(a) Show that k= = 
81 


(b) Find the mean lifetime of a component. 

(c) Show that the standard deviation of 
lifetimes is 21.2 hours. 

(d) Find the probability that a component lasts 
at most 50 hours. 


A particular device requires. two of these 
components and it will not operate if one or 
more of the components fail. The device has just 
been fitted with two new components. The 
lifetimes of components are independent. 


(e) Find the probability that the device will 
work for more than 50 hours. 

(f) Give a reason why the above distribution 
may not be realistic as a model for the 
distribution of lifetimes of these electrical 


components. {L) 


10. The times, in excess of two hours, taken to 


1. 


complete a marathon toad race are modelled by 
the continuous random variable T hours, where 
T has the probability density function 
4 
)=—PG-t 0<t<3 
fo 7 3-4 
fO=0 otherwise 


The diagram shows a sketch of the probability 
density function. 


AD 


(0) 3 


(a) Find the mean and variance of the times 
taken to complete the race. 

(b) Find the modal time taken to complete the 
race. 

(c) What proportion of competitors complete 
the race in less than the modal time? 

(d) Show that the median time to complete the 


race lies between the mean and the mode. 
(MEI) 


A tennis player hits a ball against a wall, aiming 
at a fixed horizontal line on the wall. The 
vertical distance from the horizontal fine to the 
point where the ball strikes the wall is recorded 
as positive for points above the line and negative 
for points below the line. 

It is assumed that the distribution of this vertical 
distance, X metres, may be modelled by the 
probability density function 


1.5(1-4x7)  -0.5 <x< 0.5 
0 otherwise 


fx)= 


{a) State the probability that the ball strikes the 
wall precisely 0.25 m above the line. 

(b) Determine the probability that the ball 
strikes the wall more than 0.25 m above the 
line. 

{c) (i) Give a reason why the mean value of 

X is zero. 
(ii) Calculate the variance of X. 

(d) Give one reason why the above probability 
model may not be appropriate. 

(e) Suggest one likely effect of repeated practice 
on the above probability model. {NEAB) 


12. A horticultural firm is studying the number of 


13. 


hours that daffodils will last in a vase of water 
with a new additive, The random variable X (in 
hundreds of hours) with probability density 
function 


_ 2 
fx)= - x’) O<x<2 


otherwise 


is proposed as a model. 


(a) Show that the value of the constant k is %. 

(b) Find the mean number of hours that a 
daffodil wili last, according to this model. 

(c) Use this model to find the probability that a 
daffodil will last for more than 100 hours. 


The new additive is tested on carnations and it is 
found that several of these last for more than 
250 hours. 


(d) Explain why the random variable X, with 
probability density function f(x) as defined 
above, would not be a suitable model in this 
case. 

{e) Suggest how the probability density function 
could be changed to model the time 
carnations will last. (L) 


Each batch of a chemical used in drug 
manufacture is tested for impurities. The 
percentage of impurity is X, where X is a 
random variable with probability density 
function given by 


kx O<x<1 
f(x) ={4k(4-x) 1<x<4 
0 otherwise 


where & is a constant. 


{a) Sketch the graph of f(x). 

(b) Show that & =}. 

(c) Determine, for all x, the distribution 
function F(x). 


In order to purify the chemical it is subjected to 
one of four possible purification processes, the 
percentage impurity in the batch determining the 
actual process used. The process used and its 
cost, for each level of percentage impurity, is 
shown in the table. 


Percentage Process Batch 
Impurity x used cost (£) 
O<x<t A 200 
t<x<2 B: 250. 
2<x <3 Cc 350: 
3<x<4 D 500 


(d) Determine the expected cost per batch of 
removing the impurities. 
(e} Determine the probability that the cost of 
purifying a batch exceeds the expected cost. 
(NEAB) 


14. An ironmonger is supplied with paraffin once a 


15S. 


16. 


17. 


week. The weekly demand, X hundred litres, has 
the probability density function f, where 


f)=c-x)? O<x<l 
f(x) =0 otherwise 


where ¢ is a constant. Find the value of c. 

Find the mean value of X, and, to the nearest 
litre, the minimum capacity of his paraffin tank 
if the probability that it will be exhausted in a 
given weck is not to exceed 0.02. (L) 


The probability that a randomly chosen flight 
from Stanston Airport is delayed by more than 

x hours is ito (x — 10)’, for x E R, 0< «x < 10. No 
flights leave early, and none is delayed for more 
than ten hours. The delay, in hours, for a 
randomly chosen flight is denoted by X. 


(a) Find the median, m, of X, correct to three 
significant figures. 

(b) Find the cumulative distribution function, F, 
of X and sketch the graph of F. 

(c) Find the probability density function, f, of X 
and sketch the graph of f. 

(d) Show that E(X) = 2. 


A random sample of two flights is taken. Find 
the probability that both flights are delayed by 
more than m hours, where m is the median of X. 


(C) 
The continuous random variable X has 
probability density function given by 
kx 0<x<1, 
f(xy=fkx? 1 & x <2, 
0 otherwise 
(a) Show that k= 6. 
(b) Find the cumulative distribution function 
of X. 
(c) Find, correct to two decimal places, the 
median, mm, of X. 
(d) Find, correct to two decimal places, 
P(X —m|<0.75). (C) 
Determine 4 such that 
0 x<0 
af2 O<x<1 
0 
fix) = 1<x<2 
3h 3A(x - 3 dex<4 
>) Z Kx< 
0 x>4 
is a probability density function of the 
distribution of a random variable X. 
Sketch the density function and find E(X) and 
P(X <3.5). (MEI) 


SR TST 


18. A random variable X has cumulative 
(distribution) function F(x) where 


0 x<-l 
axta -i<x<0 
F(ix)= 
@) 2ax +a O<x<1 
3a 1i<x 
Determine 


(a) the value of a, 

(b) the frequency function f(x) of X, 

(c) the expected value y of X, 

(d) the standard deviation o of X, 

(e) the probability that|X—s| exceeds §.  (C) 


Mixed test 6A 


19. (a) A discrete random variable R takes integer 


values between 0 and 4 inclusive with 
probabilities given by 


1 
rt (r=0,1,2) 


10 
P(R=n= Stay 


10 


(r=3,4) 


Find the expectation and variance of R. 

(b) A continuous random variable X takes 
values in the interval x 2 0. The probability 
density function of X is defined by 


kx ifO<x<i 
fey= ud ifx>1 
x 


Prove that & = § and find the expectation 
and variance of X. (C) 


1, A-survey of 491 households, in part of the 
Midlands, gave the following results for gross 
weekly income, £y. 


Income (y) No. of households 
O<y<. 80 68 
80 <y < 130 38 
130 <y<170 46 
170 <'y'<220 40 
220 <y< 270 50 
270 <y < 320 45 
320 <y < 400 60 
400 <y < 800 144 


(a) Draw a histogram on graph paper to 
illustrate these data. Label your scales and 
axes clearly. 

A statistician suggests that a suitable model for 

the gross weekly income in £100 units is the 

continuous random variable X with probability 
density function 


3k O<x<4 
fx)={k 4<x<8 
0 otherwise 
where & is a constant. 
(b) Find the value of k. 
(c) Use this model to estimate how many of 


these 491 households have a gross weekly 
income in the range £0-£130. 


(d) Comment on your findings. {L) 


2. The random variable X has a probability density 
function given by 


ak 
fe) = {RO 


O<x<1 
elsewhere 


k being a constant. Find the value of k and find 
also the mean and variance of this distribution. 
Find the median of the distribution. (O &C) 


3. The amount of vegetables eaten by a family in a 
week is a random variable W kg. The probability 
density function is given by 


O<w<5 


0 otherwise 


(a) Find the cumulative distribution function 
of W. 

(b) Find, to three decimal places, the probability 
that the family eats between 2 kg and 4 kg 
of vegetables in one week. 

(c) Given that the mean of the distribution is 
34, find, to three decimal places, the 
variance of W. 

(d) Find the mode of the distribution. 

{e) Verify that the amount, «, of vegetables 
such that the family is equally likely to eat 
more or less than mm in any week is about 
3.431 kg. 

(f) Use the information above to comment of 
the skewness of the distribution. 


Mixed test 6B 


S 359 


1. The continuous random variable X has 
probability density function given by 

dx? O<x<2 

0 otherwise. 

(a) Sketch the graph of f. 

(b) Calculate the mean of X. 


(c) Calculate the standard deviation of X. 
{d) Show why the median value of X must be 


f@)= 


greater than the mean. (NEAB) 
2. The random variable X has probability density 
function 
a(x — x3 O< 
fx) = (x0 — 22°) ie. < 1 
0 otherwise 


(a) Show that a=4. 
(b) Find E(X) 
(c) Find the mode of the distribution of X. (L} 


3. A firm has a large number of employees. The 
distance in miles they have to travel each day 
from home to work can be modelled by a 
continuous random variable X whose cumulative 
distribution function is given by 


F(1)=0 


ayen(t- L<x<b 


F(b)=1 


where 6 represents the farthest distance anyone 
lives from work. 


The diagram shows a sketch of this cumulative 
distribution function, 


Fx) 


01 >» * 


A survey suggests that b = 5, Use this parameter 
for parts (a) to (d). 


{a) Show that & = 1.25. 

(b) Write down and solve an equation to find 
the median distance travelled to work. 

(c) Find the probability that an employee lives 
within half a mile of the median. 

(d) Derive the probability density function for X 
and illustrate it with a sketch. 

{e) Show that, for any value of b greater than 1, 
the median distance travelled does not 
exceed 2, : (MEI) 


The normal distribution 


In this chapter you will learn how to 


@ standardise a normal variable and use standard normal tables 
e use the normal distribution as a model to solve problems 


use the normal distribution as an approximation to the binomial distribution and to the 
Poisson distribution 


The normal distribution is one of the most important distributions in statistics. Many 
measured quantities in the natural sciences follow a normal distribution and under certain 
circumstances it is also a useful approximation to the binomial distribution and to the Poisson 


distribution. 
The normal variable X is continuous. Its probability density function f(x) depends on its 


Atecatt 2 
e "Re 00 SX <0, 


mean j and standard deviation o, where f(x) === 
oO 


This is very complicated and has been included just for reference. You would not be expected 
to remember it! 


To describe the distribution, write 


X ~ N(u, 0?) 


mean 


Notice that the description gives the variance o”, rather than the standard deviation, o. 
The normal distribution curve has the following features: ' : ‘ 
fo) 


e itis bell-shaped 


e It is symmetrical about u 
e It extends from —c to too 
1 
e@ The maximum value of f(x) is 
OV2K 


e The total area under the curve is 1 


— 


Notice also that 


e approximately 95% of the 
distribution lies within two standard 
deviations of the mean 


® approximately 99.9% (very nearly all) 
of the distribution lies within three 
standard deviations of the mean 


- 20 KH wt 2o H-30 KH at 30 


The nails of the distribution depends on o. Here are some normal curves, each drawn to the 
same scale: 


X ~ N(0, 1) X~N(4,4 X ~ N(SO, 4) 
1A Ax) 4 1 AX). 4 


Lh St. 


oe 


44 45 4G 47 48 49 50 51 52 53 54 55 56 * 
n= 50 
o=2 


FINDING PROBABILITIES 


The probability that X lies between a and 6 is written 
P(a<X <b). To find this probability, you need to find the 
area under the normal curve between a and b. 


Pla<X<b) 


One way of finding areas is to integrate, but since the 
normal function is complicated and very difficult to 


integrate, tables are used instead. a b 


THE STANDARD NORMAL VARIABLE, Z 


In order to use the same set of tables for all possible values of and 07, the variable X is 
standardised so that the mean is 0 and the standard deviation is 1. Notice that since the 
variance is the square of the standard deviation, the variance is also 1. This standardised 
normal variable is called Z and Z ~ N(0, 1). 


To illustrate how the variable X is standardised to the variable Z, consider X distributed 
normally with a mean 50 and a variance 4, 


ie. X ~ N(5O, 4). w= 50 and o* =4, so o=2, 


i 
i 
i 
i 
! 
| 
| 
| 
| 


~ 0.2 and the curve is shown in the right-hand section of 


1 
The maximum value of f(x) is NOE 


the diagram below. 


Now translate the curve 50 units to the left so that the mean is 0. This is shown on the left 
hand section of the diagram. The standard deviation o is still 2, so the maximum value is 


again approximately 0.2. 


8) ed 
0.4 0.4 
X~50 ~ M0, 4) X~ M60, 4) 
Translate 50 units Dedeate 
—— | va 
: tT te ee T t tT T re 
6 0 2 4 6 cm 44 46 48 50 52 54 86 


Now ‘squash’ the curve towards the vertical axis so that the standard deviation is 1. This is 
done by dividing by the standard deviation (o = 2). 


Fa)s 
I 
Xo NON 
> a _ 
‘squash’ in ‘squash’ in You write Z= x = 
— so that Z~ N(0, 1) 


32-10 1 2 3 2 
In general 
‘To standardise X, where X ~ NU, 07) 


@ subtract the mean 
e then divide by the standard deviation o 
to obtain 

Xm ft 


Gx where Z ~ N(0, 1) 
o 


USING STANDARD NORMAL TABLES 


The standard normal tables give the area under the curve as far as 
a particular value z. This is written ®(z). 


o(2) \ 


Note that ® is a Greek letter, pronounced phi. Oz 


This area gives the probability that Z is less than a particular 
value z, so P(Z < z) = (2). 


The tables are printed on page 649. On the following page is an extract from the first section. 


The highlighted values are referred to in the following text. Notice that the values of (z) are 
given to four decimal places in the tables. 


Ls ¢ 8 9 
6 + 8 ADD 
0.5000] 0.5040 0,5080 0.5120] 0.5160 0.5199 0.5239] 0.5279 0.5319 0.5359 | 4 8 12/16 20 24/28 32 36 
a) 0.5398] 0.5438 0.5478 0.5517 | 0.5557 0.5596 [0.5636] 0.5675 0.5714 0.5753 | 4 8 12116 20 24/28 32 36 
0.5793] 0.5832 0.5871 0.5910 | 0.5948 0.5987 0.6026] 0.6064 0.6103 0.6141 | 4 8 12]45 19 23127 31 35 
(b) 0.6179] 0.6217 0.6255 0.6293 | [0.6331] 0.6368 0.6404] 0.6443 0.6480 0.6517 | 4 7 11] 15 [i9] 22126 30 34 
(c} 0.6554} 0.6591 0.6664 | 0.6700 0.6736 0.6772} 0.6808 0.6884 0.6879 | 4 7 11/14 18 22/25 29 fF 
0.6915} 0.6950 0.6985 0.7019 | 0.7054 0.7088 0.7123] 0.7157 0.7190 0.722413 7 10/14 17 20124 27 31 
0.7257] 0.7291 0.7324 0.7357 | 0.7389 0.7422 0,7454| 0.7486 0.7517 0.7549 | 3 7 1013 16 19123 26 29 
0.7580] 0.7611 0.7642 0.7673 | 0.7704 0.7734 0.7764] 0.7794 0.7823 0.7852 |3 6 9112 18 18121 24 27 
0.7881] 0.7910 0.7939 0.7967 | 0.7995 0.8023 0.8051] 0.8078 0.8106 0.8133 |3 5 8{41 14 16119 22 25 
0.8159] 0.8186 0.8212 0.8238 | 0.8264 0.8289 0.8315] 0.8340 0.8365 0.838913 5 g{|10 13 1s]18 20 23 


(a) To find P(Z < 0.16), read off the value of &(0.16): 


e find row 0.1 and go across to column 6. This gives 0.5636. 
P(Z <0.16) = 0.5636 


(b) To find P(Z < 0.345), read off the value of (0.345): 
e Find the value when z= 0.34 from row 0.3, column 4. 


This is 0.6331. 


&(z) = 0.5636 


00.16 


e Now go to the right-hand section and read the number along that row in column 5. 


This is 19. 


e Note the instruction to ADD. This means that 19 is added to the digits 6331. 


6331 
+ 19 
6350 


P(Z < 0.345) = 0.6350 


(c) To find P(Z < 0.429), read off (0.429): 
e Find row 0.4, column 2, right-hand section 9. 


P(Z < 0.429) = 0.6660 


When calculating probabilities, remember that the 


total area und 


Example 7.1 


standardised normal curve is 1, 


Using the standard normal tables printed on page 649, find 


(a) P(Z < 0.85) 


Solution 7.1 
(a) 


(0.85) 


00.85 


P(Z <0.85) = &(0, 


85) 


= 0.8023 


a 


(b) P(Z>0.85) 


| 
WN, 


1- @(0.85) 


| 
| 
| 
i 
t 
0 


tT 
0.85 


P(Z > 0.85) = 1 - &(0.85) 
= 1- 0.8023 
= 0.1977 


In general 


P(Z>ajpe=t- Pia} 
Kr? 
G a 


Finding probabilities involving negative values of z 


er find probabilities relating to 


The standard normal tables start at z= 0. You can howev! 
€ the curve. Look at these diagrams: 


negative values of z by using the symmetrical properties 0 
To find P(Z <a), where a > 0 


P(Z <-a) = B{-a) 
=1- (a) 


a «OO vo) 
To find P(Z > -a), where a > 0 


pa P(Z>-a) = (a) 


AN AX 


a ie) 


° 
e 


Example 7.2 
7,~N(0, 1) 
(a) Using the standard normal tables on page 649, find P( 
(b) Drawing sketches to illustrate your answers, find 
(i) P(Z> -1.377) 
i) P(Z> 1.377) 
(ii) P(Z<-1.377) 
(Give your answers correct to two significant figures.) 


Z < 1.377), 


Solution 7.2 
(a) P(Z < 1.377) = (1.377) 
= 0.9158 


= 0.92 (2 s.f.) 


{b) @) aN 
1 


-1.377 0 


tii 
| 1 \ 
1 
i 
i 


aye 
(iii) 


7 


oO 
i 
Pn 
+ 
1 
1 
1 
| 
1 


~1.377 0 


THE NORMAL DISTRI 


Using P(Z > -a) = P(Z <a) =® 
P(Z > -1.377) = rr 
= 0.9158 
= 0.92 (2 s.f.) 


Using P(Z > a) = 1 - ®(a) 
P(Z> 1.377) = 1 - (1.377) 
=1-0.9158 
= 0.0842 
= 0.084 (2 s.f.) 


Using P(Z < -a) =P(Z>a)=1- 
P(Z <-1.377)=1 ae i 
=1-0.9158 
= 0.0842 
= 0.084 (2 5.6). 


Important results — these are worth learning. 


In the following, a>0,b>0Oanda<b. 


i 
(a) 
| 
I 
Ji. 
oy \ 


Examples: 
(a) P(0.345 <Z< 1.751) 


= &(1.751) - (0.345) 
= 0.9600 — 0.6350 


= 0,3250 
0 
a b felted ol 
Pa » 
PlacZ<b)=@(b) - O(a) iw 
b ; 
(b) | (b) P(-2.696 < Z < 1.865) 
i = 0(1.865) — B(-2.696) 
= (1.865) ~ (1 = &(2.696)) 
= O(1.865) + ©(2.696) —1 
= 0.9690 + 0.9965 —1 
; = 0.9655 
en) b 
In practice, yo j i 
Monse sb) = ee = ee P(-2.696 < : : ee is + 
= — (1 - ®{a)) ‘ , 
ae = ©(2.696) + &(1.865) ~1 \ 
= O(a) + O(b)-1 and go on from there. 


Pi-a<Z<b)= (a) + ®(6) -1 


-2,696 
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(c) {c) 
-b <Z < -a) = ®(-a) - &(-5) 
ae sae aie Ear 
= Ob) - B(a) 
Pl-b <Z < ~a) = O(b) ~ fa) 
(d) 
=P(-a<Z<a) 
pees = Es +@(a)-1 Result (b 
=2@(a)-1 
P(|Z| <a) =20(a) ~ 1 
(e) ; (e) 
! 
| Wane 
a 0 a 
+P(Z>a) 


PUL Z)> a) =2(1 - B(a)} 


It is also useful to remember that 
PUlZl>aj=1-Pi Zi <4) 
Example 7.3 
Z ~ N(O, 1). Show that 


(a) P(-1.96 <Z< 1.96) =0.95 
(b) P(-2.575 < Z< 2.575) = 0.99 


P(-1.4<Z<-0.6) 
= &(1.4) — (0.6) 

= 0.9192 — 0.7257 
= 0.1935 


+ © 
i Oo 
Wis) 


P(| Z| < 1.433) 
= P(-1.433 < Z < 1.433) 
= 2¢b(1.433) —1 
=2x0.9240-1 
= 0,8480 


) 


Z\> 1.433 


P(| 

P( 

2(1 — (1.433) 
2 

0. 


nouow tl 


1520 4 


1- 0.9240) rf 


OS Ss 


Z <~1.433) + P(Z > 1.433) 


2 
+ 
a 
a 


1.443 


sige 


1.443 


nd 


Solution 7.3 
(a) P(-1.96 <Z< 1.96) 


= 20(1.96) — 
= 2(0.975) — 
=0.95 


P(-1.96 <Z < 1.96) = 0.95. 
The central 95% of the distribution lies between £1.96. 


(b) P(-2.575 <Z < 2.575) = 22.575) —1 


P(-2.575 <Z< 


The central 99% of the distribution lies between + 


NOTE: These are important results which will be used in later work. 


Exercise 7a 


1. If Z~ N(0, 1), find 


(a) P(Z< 0.874), 
(c) P(Z> 0.874), 


NR 


. IfZ~ N(O, 1), find 


{a) P(Z> 1.8), 

(c) P(Z>~2.46), 
(e) P(Z>2.58), 
(g) P(Z< 1.86), 
(i) P(Z> 1.863), 
{k) P(Z>-2.061), 


w 


. IEZ~ N(0, 1), find 


(a) P(Z> 1.645), 
(c) P(Z> 1.282), 
(e) P(Z>2.575), 
(g) P(Z> 2.808), 


4. Z~N(0,1) 


= 2(0.995) — 
= 0.99 


2.575) = 0.99: 


(b) P(Z > ~-0.874), 
(d) P(Z<-0.874), 


) P(Z<~0.65), 
(d) P(Z< 1,36), 
(f) P(Z>~2.37), 
) P(Z<-0.725), 
(i) P(Z< 1.63), 
) P(Z<-2.875). 


(b) P(Z <-1.645), 
(d) P(Z> 1.96), 
(f) P(Z>2.326), 
(h) P(Z< 1.96). 


Find the probabilities represented by the shaded 


areas in the diagrams. 


{a) 


(b) 1 


T\ aN 
rs i\ \ 
Finn Hie 
of ty aa fh 
Ary AW \ 


-1.96 


Finding probabilities, where Z ~ N(O, 1) 


Draw sketches to illustrate your answer and consider whether your answer is sensible, 


5. If Z~N(O, 1), find 
(a) P(0.829 <Z < 1.834), 

) P(-2.56 <Z< 0.134), 
(c) Pl-1.762 <Z <~0.246), 
(d) PO<Z<1.73), 

(e) P(-2.05<Z<0), 
(f) P(-2.08 < Z<2.08), 
(g) P(1.764 <Z<2.567), 

) P(-1.65<Z<1.725), 
(i) P(-0.98 <Z <-0.16), 
() P(Z<-1.97 or Z>2.5), 
(k) PI Z|< 1.78), 

(l) PUZI>0. a8, 
(m) Pe 1.645 <Z < 1.645), 
(n) P({Z|>2.326).- 


6. Z~ N(0, ss 


The central ...% of the 
distribution lies between 
+0.674, 


0.674 %, 674 


7. Z~N(0, 1) and P(Z <a)=0.3, 


Pla<Z<b)=0.6. 
Find 

(a) P(Z<b), 

(b) P(Z>a). 


8. Z~N(0,1) and P(Z <a) =0.7, P(Z > b) = 0.45. 
Find 


{a) (6), 
(b) P(b<Z<a). 


9. Z~N(0, 1) and P([Z|<a)=0.8. 
Find 


(a) P(Z<a), 
(b) P(Z>a). 


Complete this statement: 


COURSE IN A 


3UTION 369 


USING STANDARD NORMAL TABLES FOR ANY NORMAL VARIABLE X b) To fi a: o> ciate 
(b) AI re i Paes that the length is within 5 cm of the mean, you need to find 
Remember that to standardise X, where X ~ N(u, 07), IX—150 
@ subtract the mean pt | Dividing by the standard deviation gives P (ecm < a ie. P(| Z| < 0.5) 


e then divide by the standard deviation o P(|Z|<0.5) = 20(0.5) = 1 


: X-u =2x0.6915-1 
t Z=— here Z ~ N(O, 1 
‘0 give 2 where (0, 1) = 0.383 


= 0.38 (2 s.£.) 


result (d) page 366 


The procedure is illustrated in the following example: 


i bese probability that the length is within 5 cm of the mean is 


Example 7.4 eggs 


Lengths of metal strips produced by a machine are normally distributed with mean length of | : = 
FO LT Gea eanioal selected strip is cll 12 

Heol Coane anda qundied eile oF Dies, eceines Gaon day 
Estimate the number of days during the year when he takes every day. 
Solution 7.4 He Hoare nanote 


X is the length, in centimetres, of a metal strip. X ~ NU5O, 10”) (c) between nine and 13 minutes. 
Since 4 = 150 and o = 10, X ~ N(150, 107) 


(a) You need to find the probability that the length is shorter 
that 165 cm, i.e. P(X < 165). 


(a) shorter than 165 cm, | 
(b) within 5 cm of the mean. \ 


Solution 7.5 


X is the time, in minutes, taken to deliver milk to the High Street. 


. X~ 2 
To be able to use the standard normal tables, standardise a : N(12, 2°) 
the X variable by subtracting the mean, 150, then dividing 150 165 Standardise X usi _X-# . _X-12 
by the standard deviation, 10. Apply this to both sides of rdise X using Z ee Z= ar 
the inequality X < 165. 17-12 : 
xX-150 2~ NO, 1) (a) P(X > n=l > | 
X becomes =Z, A | 
es 150 =P(Z>2.5) | 
165 becomes ——-—— = 1.5, = 1- (2.5) 
. 10 =1- 0.9938 ; | 
so P(X < 165) becomes P(Z < 1.5) = 0.0062 i 


To find the number of days, multiply by 365. 


x: 12 17 
Z: 


P(X < 165) = P(Z< 1.5) 


= (1.5) ; 365 x 0.0062 = 2.263 = 2 

= 0.9332 | On two days in th hi i 

= 0.93 (2 s.£.) ys in the year he takes longer than 17 minutes. 
The probability that the length is shorter than 165 cm is 0.93. ' (b) P(X < 10)= ole < aoe ) 

1 
NOTE: Although the X and Z distributions have different ' =P(Z<-1) 
spreads, in practice it is convenient to show the values for x 150 165 =1-(1) 
both distributions on one sketch. ae o 18 =1-0.8413 
= 0.1587 


Now 365 x 0.1587 = 57.92 = 58 


On 58 days in the year he takes less than ten minutes. 
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Exercise 7b 


9-12 13-12 


c) PO <X< 1-5 <Z<—3— 


= P(-1.5<Z<0.5) 

= (0,5) + (1.5) - 1 
= 0.6915 + 0.9332 -1 
= 0.6247 


Now 365 x 0.6247 = 228.01 = 228 


On 228 days in the year he takes between nine and 13 minutes. 


NOTE: Since X is a continuous variable, the following are indistinguishable: 


9<X<13, 
9<X<13, 
9<X<13, 
9<X<13, 


: masses of packages from a particular 

; ane are sensally distributed with a mean of 
200 g and a standard deviation of 2 g. Find the 
probability that a randomly selected package 
from the machine weighs 
(a) less than 197 g, 
(b) more than 200.5 g, 
(c) between 198.5 g and 199.5 g. 


. The heights of boys at a particular age follow a 
‘ te at fase with mean 150.3 cm and 
variance 25 cm. ’ 
Find the probability that a boy picked at random 
from this age group has height 
(a) less than 153 cm, 
(b) more than 158 cm, 
(c) between 150 cm and 158 cm, 
(d) more than 10 cm difference from the mean 
height. 


3. X ~N(300, 25) 
Find the probabilities represented by the shaded 
areas in the diagrams: 


289 295 300 


{c) 


Finding probabilities using 


4 


xX 


~ Niu, 0°) 


"The random variable X is distributed normally 
such that X ~ N(50, 20). Find 

(a) P(X > 60.3), 

{b) P(X < 59.8). 


. X~N(-8, 12). Find 


(a) P(X <-9.8), 
(b) P(X > -8.2), 
(c) P(-7<X<0.5). 


. The masses of a certain type of cabbage are 


normally distributed with a mean of 1000 g and 
a standard deviation of 0.15 kg. 

In a batch of 800 cabbages, estimate how many 
have a mass between 750 g and 1290 g. 


. The number of hours of life of a torch battery is 


normally distributed with a mean of 150 hours 
and standard deviation of 12 hours. In a quality 
control test, two batteries are chosen at random 
from a batch. If both batteries have a life less 
than 120 hours, the batch is rejected. _ 

Find the probability that the batch is rejected. 


| Cartons of milk from a particular supermarket 


are advertised as containing 1 litre, but in fact 

the volume of the contents is normally 

distributed with a mean of 1012 ml and a 

tandard deviation of 5 ml. 

(a) Find the probability that a randomly chosen 
carton contains more than 1010 ml. 

{b) Ina batch of 1000 cartons, estimate the ‘ 
number of cartons that contain less than the 
advertised volume of milk. 
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9. A random variable X is such that X ~ N(-S, 9). 
(a) Find the probability that a randomly chosen 
item from the population will have a 
positive value. 
(b) Find the probability that out of ten items 
chosen at random, exactly four will have a 
positive value. 


sizes of marrow: 
Size 1, under 0.9 kg, 
Size 2, from 0.9 kg to 2.4 kg, 
Size 3, over 2.4 kg. 


Find, three decimal places, the proportions of 
marrows in the three sizes, 

The prices of the marrows are 16p for Size 1, 
40p for Size 2 and 60p for Size 3. Calculate the 
expected total cost of 100 marrows chosen at 
random from those supplied. (L) 


10. X~N(100, 81). Find 
(a) P(|X-100|<18), 
(b) P(|X—100]>5), 


{c) P(12<X~100< 15). 13. The random variable Y is such that Y ~ N(8, 25). 


Show that, correct to three decimal places, 
P(|Y¥-8|<6.2)=0.785. 
Three random observations of Y are made. Find 

the probability that exactly two observations will 
lie in the interval defined by |Y~8|<6.2, (C) 


11. The life of a certain make of electric light bulb is 
known to be normally distributed with a mean 
life of 2000 hours and a standard deviation of 
120 hours. Estimate the probability that the fife 
of such a bulb will be 
{a) greater than 2150 hours, 14. The manufacturers of a new model of car state 
{b) greater than 1910 hours, that, when travelling at 56 miles per hour, the 
(c)_ within the range 1850 hours to 2090 hours. petrol consumption has a mean value of 32.4 

(C) miles per gallon with standard deviation 1.4 

miles per gallon. 

Assuming a normal distribution, calculate the 

probability that a randomly chosen car of that 

model will have a petrol consumption greater 

than 30 miles per gallon when travelling at 56 

miles per hour. {C) 


12. The weights of vegetable marrows supplied to 
retailers by a wholesaler have a normal 
distribution with mean 1.5 kg and standard 
deviation 0.6 kg. The wholesaler supplies three 


USING THE STANDARD NORMAL TABLES IN REVERSE TO FIND z 
WHEN 4(z) IS KNOWN 


The procedure is illustrated below using the extract taken from the standard normal tables. 
The highlighted values are referred to in the examples. 


23 4 § 6 e) 

2 3 4 S 6 7 § 9 ADD 
(a) 1.5] 0.9332] 0.9345 0.9357 0.9370} 0.9382 0.9394 [0.9406] 0.9418 0.9429 0.9441] 1 2 4;5 6 7) 8 40 11 
18] 0.9452) 0.9463 0.9474 0.9484] 0.9495 0.9505 0.9515| 0.9525 0.9535 0.9545|1 2°314 5 617 8 9 
{b) 3.7} 0.9554] 0.9564 [0.9573 0.9582] 0.9591 0.9599 0.9608] 0.9616 0.9625 0.9633} 1 2 314 4 5 |[6 7 8 
1.8] 0.9641} 0.9649 0.9656 0.9664] 0.9671 0.9678 0.9686] 0.9693 0.9699 0.970611 1213 4 4 s 6 6 
| 0.9713] 0.9719 0.9726 0.9732} 0.9738 0.9744 0.9750] 0.9756 0.9761 0.9767|1 1 212 3 4 4 5 § 
2.) 0.9772] 0.9778 0.9783 0.9788] 0.9793 0.9798 0.9803] 0.9808 0.9812 0.9817}0 1 1/2 2 3] 3 4 4 
0.9821} 0.9826 [0.9830] 0.9834] 0.9838 0.9842 0.9846] 0.9850 0.9854 0.9857/0 1 1 If] {2 Zit 3 304 
2.2} 0.9861] 0.9864 0.9868 0.9871] 0.9875 0.9878 0.9881] 0.9884 0.9887 0.9890/0 1 1/1 2 2 3.3 


(a) If you are given that &(z) = 0.9406, iy bai06. ol 
to find z, look for 0.9406 in the main body of the table. : é 
This occurs when z= 1.56, 
so if O{z) = 0.9406, then z= 1.56. 

Using notation similar to the used in trigonometry where, 
for example, if sin 6 = 0.82, then @ = sin 0.82, you could 4 
write OF Se 
®-'(0.9406) = 1.56 

This means that the value of z such that ®(z) = 0.9046 is 1.56. 


(b) Find z if P(Z <z) =0.9579 
®(z) = 0.9579 
so z= (0.9579) 
Look for 0.9579 in the main body of the table. It does not appear, so look for the number 


below it. This is 0.9573 and it occurs when z= 1.72. 
To get the digits 9579 you need to add 6 to 9573. Look at the far right-hand section and 


find 6. It is in column 7. This means that the z value required is 1.727. 
So z= (0.9579) = 1.727. 
Find z if P(Z < z) = 0.9832 
@(z) = 0.9832 
80 7= 10.9832) 
Look for 0.9832 in the main body of the table; note that @(2.12) = 0.9830. 
Refer to the end column and you find that 


(2.124) = 0.9832 

(2.125) = 0.9832 

(2.126) = 0.9832 

The probabilities have been given to four decimal places and it is not possible to 
distinguish between the z values, so just decide on one of them, say 2.124. 

So z= 070.9832) = 2.124. 


S 


NOTE: If you cannot find the value for the probability in the table, choose the value that 


is closest to the required probability. 
Often final answers are given to two or t 
not important. 


hree significant figures, so these discrepancies are 


Example 7.6 
If Z ~ N(0, 1), find the value of 4 if 
(a) P(Z <a) = 0.9693 
(b) P(Z >a) = 0.3802 
(c) P(Z >a) = 0.7367 
(d) P(Z <a) = 0.0793 


PIZ <a) = 0.9693 


PZ <a) = 0.3802 


= 0.305 _ 


(c) P(Z> a) = 0.7367 
Since the probability is greater than 0.5, a must be 
negative, and therefore —a is positive. 
Using symmetry, ®(-a)} = 0.7367 
-a = © (0.7367) 
= 0.633 , 
a= —0.633 


P(Z > a) = 0.7367 


XK 


ote eet 


(~a) 


I 
ii 
t 
t 
t 
I 
1 
+t 
it} 


(d) P(Z <a) = 0.0793 
Since the probability is less than 0.5, ¢ must be 
negative. 
Using symmetry, 


®(—a) = 1- 0.0793 


PIZ <a) = 0.0793, (| 


= 0.9207 
~a = ©“(0,9207) aS 
=141 
a=-141 


Example 7.7 
If Z ~ N(0, 1) find a such that P(| Z[<a)=0.9. 


Solution 7.7 
P(|Z| <a) =0.9, 
ie. P(-a<Z<a)=0.9. 
From symmetry, using result (d) on page 366, 
2@(a) -1=0.9 
2@(a) = 1.9 
®(a) = 0.95 
a=0(0.95) 
= 1.645 
This means that the central 90% of the standard normal 0.08, A 
distribution lies between +1.645. as 


N Pa <Z <a) 
\ =0.9 


ofme seal 


Alternatively, pe 
If P(-a<Z<a)=0.9 
mo the value of a corresponds to an upper tail probability 
of 0.05, and a lower tail probability of 0.95. 
®(a) = 0.95 
a=071(0,95) 
= 1.645 


(a) = 0.95, £ 


fy 
I 
t 
if 
1 
i 
+ 
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rH Ai, DISTRIBUTION 375 
USING THE TABLES IN REVERSE FOR ANY NORMAL VARIABLE X Example 7.9 
The marks of 500 candidates in an examination are normally distributed with a mean of 
Example 7.8 45 marks and a standard deviation of 20 marks. 
The heights of female students at a particular college are normally distributed with a mean of (a) Given that the pass mark is 41, estimate the number of candidates who passed the 
169 cm and a standard deviation of 9 cm. ‘Staiditaden, 
(a) Given that 80% of these female students have a height less than b cm, find the value of b. (b) If 5% of the candidates obtain a distinction by scoring x marks or more, estimate the 
(b) Given that 60% of these female students have a height greater than s cm, find the value value rare 
obe (c) Estimate the interquartile range of the distribution. (L Additional} 
Solution 7.8 Solution 7.9 
X is the height, in centimetres, of a female student. X is the examination mark. ‘ 
X~ N(169, 97) X ~ N(45, 207) \ 
(a) Given P(X <b) =0.8 41-45 1 
Standardising (a) P(X > 41)=P(Z > 2 H \ 
I if 
eI? as = P(Z > -0.2) “ ae | 
9 = (0.2) ‘ 4145 i 
= 0.5793 : -0.2 0 i 
eee ; a P(Pass) = 0.5793 
| Since there are 500 candidates, to find the number of candidates who pass, multipl th | 
He eer 8) i probability by 500. sea a laa | 
= 0,842 | 500 x 0.5793 = 289.65 
h-169 = 0.842 290 candidates passed. 
b= 169 +9 x 0.842 (b) P(X> x) = 0.05 | 
= 176.38 Writing z for the standardised value of x, \ 
= 176.4 (1 dip.) Hasan tius where 222745 ' 
(b) Given P(X > s) = 0.6 f ars ! x 
Standardising ®(z) =1- 0.05 : ail a = | 
=0.95 45 x 
-196 
of >- |p06 z= 010,95) 2 oe 
9 = 1.645 
Let eS NF 4.648 
eng D0" =o 
P(Z > z)=0.6 : ee eg 
H = 78 (2 s.f. 
z must be negative ae (2 s.f.) 
and (-z)=0.6 A distinction is awarded for a mark of 78 or more. 
-g = 0.6) (c) The interquartile range encloses the central 50% of the distribution between the lower 
= we ; : quartile q,, and upper quartile, q. 
z=-0. 
s-169 9 
=-0.253 If P(-z<Z <z) = 50% then z corresponds to an upper 
9 tail probability of 25%. 
s= 169-9 x 0.253 
= 166.723 So @(z)= oe 
= 166.7 (1 dp.) z= 0'(0.75} 
——— = 0.674 


ee eri oA I 
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Now z is the standardised value of the upper quartile, q3, 


-45 
es Bi =0.674 


q3=45 +20 x 0.674 


= 58.48 
S 
Lower quartile q, is such that =-0,674 
41 = 45 ~20 x 0.674 
= 31.52 


Interquartile range = 3-41 
. = 58,48 — 31.52 


= 27 (2s.f.) 


Exercise 7c Using the standard norma 


1. In the following, find the value of z; where Z ~ N(0, 1). 


(a) 
PZ <2) = 0,506 
0.506, 
Oz 
(b} 
(c) 1 
1 PIZ <2) = 0.0296 
! 
| 
i 
0.0296 1 
1 
+ wet 2 
z i) 
(d) 
PZ <2) = 0.325 
#2 eG 
{e) 


PZ >2)= 0.713 


0.713 


tables in reverse | 


(f) 
P(Z > 2) = 0.154 | 
i 
0.154 | 
Oo Zz 
(8) 
P\\Z] <2) = 0.6 
zo7Z 
2. Z~N(0, 1). 


Find the value of a, where 


(a) P(Z<a)= 0.9738 
(b) P(Z <a) = 0.2435 
P 


(c) P(Z >a) = 0.82 

(d) P(Z>a)=0.2351 
3. Z~N(0, 1). 

Find a if 


(a) P(|Z|<a)=0.6372, 
(b) P({Z| >a) =0.097, 
(c) P({Z|<a)=0.5, 

{d) P(| Z| >a) = 0.0404. 


4. If Z ~ N(O, 4), find the upper quartile and the 
lower quartile of the distribution. Fin 
70th percentile. 


d also the 


5. Find x in each of the following. 
(a) X~N(60, 25) 
i 


EN 
asrre,(. \ PIX <x) = 0.9972 


[ 
13 
if 
I 
| 
1 
t 
+ 


ca Bo 

60 x 
(b) X ~NS.3 

A 
: \ PIX <x) =0.3 
1 

03 ; 

ee a ! i 
+ 
5 


{c) X ~ N(200, 36) 
A. 


0.9386 
XK PUX> x) = 0.9386 


1 

i) 

1 4 
ys 

I 

f 

i) 


\ P(X > x} = 0.2315 


NC 23.15% 


Ny 


0 x 


6. Bags of flour packed by a particular machine 


have masses which are normally distributed with 
mean 500 g and standard deviation 20 g. 

2% of the bags are rejected for being 
underweight and 1% of the bags are rejected for 
being overweight, Between what range of values 
should the mass of a bag of flour lie if it is to be 
accepted? 


7. The masses of cos lettuces sold at a market are 


normally distributed with mean mass 600 g and 
standard deviation 20 g. 


(a) Ifa lettuce is chosen at random, find the 
probability that its mass lies between 570 g 
aud 610g. 

{b) Find the mass exceeded by 7% of the 
lettuces. 

(c}_ In one day, 1000 lettuces are sold. 


Estimate how many weigh less than 545 g. 


8. A sample of 100 apples is taken froma load. 
The apples have the following distribution of sizes 


Diameter to nearest cm.) 6:0. 2ey Be G10. 


Frequency. Td. 2 3817-13 


Determine the mean and standard deviation of 
these diameters. 

Assuming that the distribution is approximately 
normal with this mean and this standard 
deviation find the range of size of apples for 
packing, if $% are to be rejected as too small 
and 5% are to be rejected as too large. (O & C) 


9. X ~N(400, 64). 


(a) Find the limits within which the central 
95% of the distribution lies. 

(b) Find the interquartile range of the 
distribution. 


10. The lengths of metal strips are normally 
distributed with a mean of 120 cm and a 
standard deviation of 10 cm. Find the probability 
that a strip selected at random has a length 


{a) greater than 105 cm, 
(b) within 5 cm of the mean, 


Strips that are shorter than L cm are rejected. 
Estimate the value of L, correct to one decimal 
place, if 9% or all strips are rejected. 

In a sample of 500 strips, estimate the number 
having a length over 126 cm, (C) 


11. The numbers of shirts sold in a week by the 
world’s largest menswear store are normally 
distributed with a mean of 2080 and a standard 
deviation of 50. Estimate 


(a) the probability that in a given week fewer 
than 2000 shirts are sold, 

(b) the number of weeks in a year that between 
2060 and 2130 shirts are sold, 

{c) the interquartile range of the distribution, 

{d) the least number’x of shirts such that the 
probability that more than # are sold ina 
given week is less than 0.02. (C) 


12. Batteries for a transistor radio have a mean life 
under normal usage of 160 hours, with a 
standard deviation of 30 hours. Assuming the 
battery life follows a normal distribution, 


{a) calculate the percentage of batteries which 
ave a life between 150 hours and 

180 hours, 

(b) calculate the range, symmetrical about the 
mean, within which 75% of the battery lives 
ie. 


If a radio takes four of these batteries and 
requires all of them to be working, calculate 


(c) the probability that the radio will run for at 
east 135 hours. {O&C) 
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FINDING THE VALUE OF » OR o OR BOTH 


Example 7.10 


The lengths of certain items follow a normal distribution with mean uw cm and standard 
deviation 6 cm. It is known that 4.78% of the items have length greater than 82 cm. Find the 


value of the mean yu. 


Solution 7.10 


X is the length, in centimetres, of an item. 
X ~ Ni, 62) and P(X > 82) = 0.0478 


Since P(X > 82) is less than 0.5, 82 must be greater than wu. 


P(X > 82) = P(Z > z) where 


_ 82-4 
ere 
so P(Z>z)=0.0478 
@(z) = 1 - 0.0478 
= 0.9522 
z= 0 (0.9522) 
= 1.667 
82-u 
—-— = 1.667 
Z 6 


82 -= 1.667 x6 
p= 82 — 1.667 x 6 =71.998 
The mean, «= 72 cm (2 s.f.) 


Example 7.11 


X ~ N(100, 0”) and P(X < 106) = 0.8849. 
Find the value of the standard deviation o. 


Solution 7.11 


P(X < 106) = 0.8849 
so P(Z<z) = 0.8849 
106-100 6 


where 
o 


‘3 oO 
®(z) = 0.8849 
z= O-(0.8849) 


The standard deviation, o = 5 


Example 7.12 


The masses of boxes of oranges are normally distributed such that 30% of them are greater 
than 4.00 kg and 20% are greater than 4.53 kg. Estimate the mean and standard deviation of 
the masses. {C) 


Solution 7.12 


X is the mass, in kilograms, of a box of oranges. 
X ~ N(u, 0?) where and o are to be found. 


P(X > 4.00) =0.3 
P(Z>z)=0.3 


saa Pestaislel 


rT 
S 
Nn 
we 
aS § 


4.00 ~ = 0.5240 x: # 4,00 
4.00 =u + 0.5240...... ® Zz: Oz 


P(X > 4.53) =0.2 
P(Z>z)=0.2 
ee 4.53—n 
0 
o(z) =1-0.2 
= 0.8 
z="1(0.8) 
= 0.842 


= 0.842 


4.53 - "= 0.8420 x 
4.53 =" + 0.8420....., ® Zz: 


where 


4.53 - 
oO 


rs 
- 
a 
a 


Equation @ — equation @ gives 


0.53 = 0.3180 
o= 1.666 ... = 1.67 (3 s.f.) 


Substituting in equation © 


4.00 = + 0.524 x 1.666 ... 
= 3.126... =3.13 (3 s.£). 


#= 3.13 kg and o = 1.67 kg 


Example 7.13 
i i i ken to be normally 
The speeds of cars passing a certain point on a motorway can be ta 
distributed. Observations show that of cars passing the point, 95% are travelling at less than 
85 m.p.h. and 10% are travelling at less than 55 m.p.h. 


(a) Find the average speed of the cars passing the point. 
(b) Find the proportion of cars that travel at more than 70 m.p.h. 


(L) 
Solution 7.13 


X is the speed, in m.p.h., of a car passing a certain point. 
X~ Nu, 0”). 
(a) P(X <85) = 0.95 0.95 


85—u 
ie. P(Z<%4)=0.95 where z= = 


=o" x: nw 85 
2761095) Dons 


=1645 = 85="+1.6450...0 


85-p" 
o 


P(X < 55)=0.1 
SS—bM 


ie. P(Z<z)=0.1 where z= . 
@(-%) = 0.9 A \ 
~2 = 670.9) ot NU - 


=-1,.282 = S5=py-1.2820...@ 


o 


@ - @ gives 30 = 2.9270 
o=10,249 ... 
Substituting in@ 85 =+1.645 x 10.249... 


= 68.139 ... 


The average speed is 68 m.p-h. (2 s.f.). 
70 - 68.13... 
10.24... 


= P(Z> 0.1815) 
=1- (0.1815) Bin 0 
ere ee 7 0 0.1815 


The proportion of cars travelling at more than 70 m.p.h. is 0.43 (2 s.f.}. 


(b) P(Z > 70)=0(2 > 


Exercise 7d Finding w or o or both, where X ~ Nu, 0°) 


You are advised to draw sketches to il 


1. 


The random variable X is distributed Niu, 0”), 
with o =25,. 
If P(X < 27.5) = 0.3085, find the value of u. 


. The random variable X is normally distributed 


with a mean of 45. The probability that X is 
greater than 51 is 0.288. Find the standard 
deviation of the distribution. 


. The volumes of drinks in cans are normally 


distributed with a mean of 333 mi. 
Given that 20% of the cans contain more than 
340 ml, find the standard deviation of the 
volume of drink in a can. Find also the 
percentage of cans that contain less than than 
330 ml. 


. The random variable X is distributed N(x, 12) 


and it is known that P(X > 32) = 0.8438. Find 
the value of z. 


- The heights, measured in metres, of 500 people 


are normally distributed with a standard 
deviation of 0,080 m, Given that the heights of 
129 of these people are greater than the mean 
height, but less than 1.806 m, estimate the mean 
height. (C) 


. The random variable X is distributed N(x, 07). 


P(X > 80) = 0.0113 and P(X < 30) = 0.0287. 
Find w and o. 


. The masses of boxes of apples are normally 


distributed such that 20% of them are greater 
than 5.08 kg and 15% are greater than 5.62 kg. 
Estimate the mean and standard deviation of the 
masses. 


- Metal rods produced by a machine have lengths 


that are normally distributed, 
2% of the rods are rejected as being too short 
and 5% are rejected as being too long. 


{a) Given that the least and greatest acceptable 
lengths of the rods are 6.32 cm and 7,52 cm, 
calculate the mean and variance of the 
lengths of the rods. 

(b) If ten rods are chosen at random from a 
batch produced by the machine, find the 
probability that exactly three of them are 
rejected as being too long. 


The random variable X is distributed N(z, 07). 
P(X < 35) =0.2 and P(35 < X < 45) = 0.65. 
Find y and oa. 


ustrate your answers. 


10. 


aes 


12. 


13. 


14. 


15. 


The marks in an examination were found to be 
normally distributed. 

10% of the candidates were awarded a 
distinction for obtaining over 75. 

20% of the candidates failed the examination 
with a mark of under 40. Find the mean and 
standard deviation of the distribution of marks, 


A farmer cuts hazel twigs to make into bean 
poles to sell at the market. He says that a stick is 
240 cm long. In fact the lengths of the sticks are 
normally distributed and 55% are over 240 cm 
long. 10% are over 250 cm long. 

Find the probability that a randomly selected 
stick is shorter than 235 cm. 


The diameters of bolts produced by a particular 
machine follow a normal distribution with mean 
1.34 cm and standard deviation 0.04 cm. A bolt 
is rejected if its diameter is less than 1.24 cm or 
more than 1.40 cm. Find the percentage of bolts 
which are accepted. 

The setting of the machine is altered so that the 
mean diameter changes but the standard 
deviation remains the same, With the new 
setting, 3% of the bolts are rejected because they 
are too large in diameter. Find the new mean 
diameter of bolts produced by the machine. Find 
also the percentage of bolts that are now rejected 
because they are too small in diameter. 


Tea is sold in packages marked 750 g. The 
masses of the packages are normally distributed 
with a mean of 760 g. It is known that less 

than 1% of packages are underweight. What is 
the maximum value of the standard deviation of 
the distribution? 


The random variable X is normally distributed. 
The probability that X is less than 53 is 0.04 and 
the probability that X is less than 65 is 0.97. 
Find the interquartile range of the distribution, 


A certain make of car tyre can be safely used for 
25 000 km on average before it is replaced. The 
makers guarantee to pay compensation to 
anyone whose tyre does not last for 22 000 km. 
They expect 7.5% of all tyres sold to qualify for 
compensation. Assuming that the distance X 
travelled before a tyre is replaced has a normal 
probability distribution, draw a diagram 
illustrating the facts given above. 

Calculate, to three significant figures, the 
standard deviation of X. 

Estimate the number of tyres per 1000 which 
will not have been replaced when they have 
covered 26 500 km, (L Additional) 
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16. Two firms, Goodline and Megadelay, produce 
delay jines for use in communications. The delay 
time for a delay line is measured in nanoseconds 
(ns). 

(a) The delay times for the output of Goodline 
may be modelled by a norma! distribution 
with mean 283 ns and standard deviation 
8 ns. What is the probability that the delay 19 
time of one fine selected at random from 
Goodline’s output is between 275 ns and 
286 ns? 

{b) It is found that, in the output o Megadelay, 
10% of the delay times are less than 
274.6 ng and 7.5% are more than 288.2 ns. 
Again assuming a normal distribution, 
calculate the mean and standard deviation 
of the delay times for Megadelay. Give your 
answers correct to three significant figures. 

(C) 


17. Machine components are mass-prod luced at a 
factory. A customer requires that the 
components should be 5.2 cm long but they will 
be acceptable if they are within limits 5.195 cm 


to 5.205 cm. The customer tests the components 20. 


and finds that 10.75% of those supplied are 
over-size and 4.95% are under-size. Find the 
mean and standard deviation of the lengths of 
the components supplied, assuming that they are 
normally distributed. 

If three of the components are selected at 


random what is the probability that one is 21. 


under-size, one is over-size and one satisfactory? 


18. A machine dispenses peanuts into bags so that 
the weight of peanuts in a bag is normally 
distributed. 


(a) Initially the mean weight of peanuts in a bag 
ig 128.5 g and the standard deviation is 
1.5 g. Find the probability that the weight of 
peanuts in a randomly chosen bag exceeds 
130g. 

(b) The machine is given a minor overhaul that 
changes the mean weight, #, of peanuts in a 
bag without affecting the standard 
deviation. Following the overhaul, 14% of 
bags contain more than 130 g of peanuts. 
Find, to four significant figures, the new 
value for 4. 


(c) Later the machine requires a major repair, 
following which the mean weight of peanuts 
in a bag is 128.3 g, and 4% of bags contain 
less than 126 g. Find, to three significant 
figures, the standard deviation of the weight 
of peanuts in a bag after this major repair. 

(NEAB) 


. A machine is used to fill cans of soup with a 


nominal volume of 0.450 litres. Suppose that the 
machine delivers a quantity of soup which is 
normally distributed with mean 4 litres and 
standard deviation a litres. Given that 4 = 0.457 
and o = 0.004, find the probability that a 
randomly chosen can will contain less than the 
nominal volume. 

It is required by law that no more than 1% of 
cans contain less than the nominal volume. 

Find 


(a) the least value of ¢ which will comply with 
the law if o = 0.004, 

(b) the greatest value of o which will comply 
with the law if ye = 0.457. (MET) 


The masses of packets of sugar are normally 
distributed. In a large consignment of packets of 
sugar, it is found that 5% of them have a mass 
greater than 510 g and 2% have a mass greater 
than 515 g. Estimate the mean and the standard 
deviation of this distribution. (C) 


Ona particular day, 50% of the employees in a 
large company had arrived at work by 8.30 a.m., 
and 10% had not arrived by 8.55 a.m. 


(a) Assuming a normal model, find the standard 
deviation of the arrival times, in minutes. 

(b) It is given that only 5% of the employees 
had arrived by 8.05 a.m. Without further 
calculation, explain why this might suggest 
that a normal model is pot appropriate. 

(c) Eighty employees are selected at random. 
Find the expectation of the number of these 
employees that arrived between 8.30 and 
8.55 a.m. (C) 


THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION 


Under certain circumstances the normal distribution can be used as an approximation to the 
binomial distribution. One practical advantage is that the calculations for finding probabilities 


are much less tedious to perform. 


The diagrams opposite illustrate the distribution B(x, p) for p = 0.2 and p = 0.5, for various 
values of . In each case a vertical line graph has been drawn, and to make comparison easier, 


a curve has been superimposed on each. 


(a) n=5 
s X ~ BUS, 0.2) ae 
= 2| X~ B16, 0.5) 
* = 
= 
% 
4 
Z \ 
x 
‘| 
7 XN 
ul \ 
0123 
(b) n 4 5 x 
2 
u z 
x a 
« a 
= 


0 2 4 6 8 10 12x RE Tay EEC RT 
12 * 


(c) 2=20 
X ~ B(20, 0.2) 


X ~ B(20, 0.5) 


=x) 


PIX = x) 


0246810 20% % h 
0 3 10 «17 2 
Notice - 
e when p =0.5, the distributi i 
ons are symmetric, and for larger istributi 
takes on the characteristic normal shape, ; pe reneee cheer 
e 


wher p= 0.2, the distribution is positivel wi OF S$! vi nn = 
whe Pe positively skewed f $ he 2. 
the distri jon is al : ‘cal y fi mall values of ny, but when 2 = 0, 


For the discrete random variable X, distri i 
, distributed binomially wh ~ 
t= E(X) = np and the variance o? = Var(X) = npg (see page 286), anne 


When 1 i: i 

ae s large and p is not too far from 0.5, a normal distribution with mean mp and 
e mpg can be used as an approximation to the binomial distribution 

A rule that can be used is as follows: 


lf X ~ Bia, p) and 2 and re 
ae BUM; DP) 2 pare such that wp > 5 and aq > 5, where g = 
AX ~ Nip, npq) approximately. ’ iia a 


CONTINUITY CORRECTIONS 


The f i iliti 
oe example compares probabilities obtained using a binomial distribution and 
approximation. It also illustrates the use of a continuity correction, needed whi ‘i 
5 en using 


a continuous distribution ( € NO: as a 
rma: N appro: atlo: ISCcre € 
b th ) approximation for a discrete distribution (th 


SEE 


UTION 383, 


a 
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384 ACO 


Writing the symbol — to represent ‘transforms to’, 


Example 7.14 
Find the probability of obtaining 4, 5, 6 or 7 heads when a fair coin is tossed 12 times PYASX<7)— a vi <7.5) : 
ST 75-6 1 : 
(a) using the binomial distribution, =P G Te i | 
(b) using a normal approximation to the binomial distribution. | 3 V3 | : 
i = P(-1.433 < Z < 0.866) i 
. = @(1.433) + 0(0.866) —1 x 35 6 75 
Solution 7.14 = 0.732 (3 d.p.) ¥.  =1443 0 0.866 


X is the number of heads in 12 tosses. 


Note that the probabilities found 
Since the coin is fair, P(head) = 0.5, so X ~ B(12, 0.5). probabilities found by the two different methods compare well and the working 


for part (b) is quicker to perform. The approximation is good because, although z is not very 


(a) Using the binomial distribution, . large, p =0.5. 
P(X =4) = ?C,(0. 5)8 x (0.5)4 = a eae ca ee : uannererane ona 
P(X = 5) = ?C,(0.5)? = 0.1933 . | More about continuity corrections 
12 
P(X=6)= "Ce (0. lig 0.2255 . j Continuity corrections sometimes cause difficulties, so these are considered in more detail, | 
P(X = 7) = *C,(0,5)" = 0. 1933. using the diagram for the distribution of the number of heads when a coin is tossed 12 times. 
P(4<X<7) =0.733 (3 dp.) If you want the probability that there are three heads or fewer, 


(b) The diagram below shows the probability distribution for X ~ B(12, 0.5). Note that the ie, P(X < 3), then consider P(X < 3.5). 


vertical lines have been replaced by rectangles to help illustrate the intention to use a 
continuous distribution as an approximation for a discrete one. The required binomial 
probability is represented by the sum of the areas of the shaded rectangles. 


f torte . c 0 1 2 3 4 
First check the conditions for a normal approximation: POX < 3) rectangle for 3 included 


np =12x 0.5 = 6, so mp > 5 
ng =12x0.5 =6,sonq>S$ 


If you want the probability that there are fewer than three a 
heads, i.e. P(X <3), then consider P(X < 2.5). 


Since mp > 5 and nq > S, use the normal approximation 


X ~ N(np, mpq) with np = 6, npg = 12 x 0.5 x 0.5 =3 


So X ~ N(6, 3). ' O° 12 sh 
i P(X <3) rectangle for 3 not included 


If you want the probability that there are exactly three‘heads, a 
i.e. P(X = 3), then consider P(2.5 < X < 3.5). 


Superimposing the curve which is approximately N(6, 3), the probability of obtaining 4, 
5, 6 or 7 heads is found by considering the area under this normal curve from x = 3.5 to | 


x=7,5, 
0.20 
ou 23 4 
0.15 Consider these further examples. oe 
acta P(S<X <8) P(4.5 <X < 8.5) (5, 6, 7 or 8 heads) 
P(S<X<8)—>P(5.5< X< 8.5) (6, 7 or 8 heads) 
08 a ata etn (5, 6 or 7 heads) 
- — P(S<X <8) P(S.5<X <7.5) (6 or 7 heads) 
012 34 5 6 778 9 101112 P(X < 4} P(X < 3.5} (0, 1, 2, 3 heads} 
tes Nba P<) PK <4) (0, 1,2, 3 oF 4 heads 
> X>3.5 
P(4< X <7) transforms to P(3.5 < X < 7.5) using a continuity correction. P(X > ie te > ne : 4 a : S ears 
P(X=9)> ae <X< 9.5) (9 heads) 
P(X =7)— P(6.5 <X <7.5) (7 heads} 
P(X 20) P(X > -0.5)} (0, 1,2, ..., 12 heads) 
aed agen (1, 2, 3,..., 12 heads) 
P(X = 0) P(-0.5 < X < 0.5) (0 hea 3) 
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Exercise 7e Continuity corrections 


Write down the transformations for cach of the following, when a normal distribution is to be used as an 


approximation for a binomial distribution. 
1. PR<X<9) 

. PiB<X<9) 

. P(LO<X<24) 

| P(2<X <8) 

. P(X > 54) 

. P(X > 76) 

. P(45<X<67) 

| P(X < 109) 

. P(X <45) 

. P(X = 56) 


Cw eON AU RWHYD 


e 


Example 7.15 


11. 
12. 
13. 
14. 
15. 
16. 
17. 
18. 
19, 
20, 


P(400 < X < 560) 
P(X = 67) 

P(X > 59) 

P(X = 100) 
P(34<X< 43) 
P(X =7) 

P(X > 509) 
P(X <7) 
P(27<X<29) 
P(X = 53) 


In a sack of mixed grass seeds, the probability that a seed is ryegrass is 0.35. 
Find the probability that in a random sample of 400 seeds from the sack, 


(a) less than 120 are ryegrass seeds, 


(b) between 120 and 150 (inclusive) are ryegrass, 


(c) more than 160 are ryegrass seeds. 


Solution 7.15 


X is the number of ryegrass seeds in a sample of 400 seeds. 


= 400, p = 0.35, q = 0.65, so X ~ B(400, 0.35) 


To see whether a normal approximation is suitable, check the value of np and nq: 
np = 400 x 0.35 = 140 and nq = 400 x 0.65 = 260. 
Since np > 5 and ng <5, use the normal approximation 

X ~ N(np, npq) with np = 140, npg = 400 x 0.35 x 0.65 = 91 


So X ~ N(140, 91) 


(a) P(X <120) > P(X <119.5) (continuity correction) 


119. 5= | 


Yor 
= P(Z<-2.149) 
=1- (2.149) 

= 0.0158 


P(X < n19.)=P(z < 


119.5 140 
Z: -2.149 90 


The probability that there are less than 120 ryegrass seeds is 0.016 (2 s.f.). 


(b) P(120 < X < 150) > P(119.5 < X < 150.5) 
119.5 -— 140 


(c 


ontinuity correction) 
150.5 - 140 


P(L19.S <X< 150.97 


V91 V91 
= P(-2.149 <Z< 1.401) vA NN 
= (2.149) + O(1.101) —1 X% 119.5 140 150.5 


= 0.8487 


Zz: -2.149 0 1.101 


The probability that there are between 120 and 150 ryegrass seeds is 0.85 (2 s.f.}. 
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(c) P(X > 160) — P(X > 160.5) (continuity correction) 


P(X > 160.5) -P(2 > a 
VOL 1 
= P(Z>2.149) 

=1- (2.149) x: 140 160.5 

= 0.0158 , a 3008 


The probability that there are more than 160 ryegrass seeds is 0.016 (2 s.f.). 


NOTE: You should define X as binomial, then check th: iti 
IT. i at th i 
defining the approximate normal distribution. Seta snee ons 


Example 7.16 


It is given that 40% of the population support the Gamboge Party. One hundred and fifty 
members of the population are selected at random. Use a suitable approximation to find the 
probability that more than 55 out of the 150 support the Gamboge Party. (C) 


Solution 7.16 


X is the number in 150 who support the Gamboge Party. 
n= 150, p = 0.4, g =0.6 
so X ~ B(n, p) with n = 150, p = 0.4, q = 0.6 


Check mp and nq: 
np = 150 x 0.4 = 60, nq = 150 x 0.6 = 90 


Since np > 5 and nq > 5, use the normal approximation 
X ~ N(np, npq) with np = 60, npq = 150 x 0.4 x 0.6 = 36 


So X ~ N(60, 36) 
P(X > 55) P(X > $5.5) (continuity correction) 
-7(2> | i 
1 
=P(Z>~-0.75) iC 
= 0(0.75) i 
= 0.7734 
= 0.77 (2 s.f.) 7 on oO 


DECIDING WHEN TO USE A NORMAL APPROXIMATION AND WHEN TO 
USE A POISSON APPROXIMATION FOR A BINOMIAL DISTRIBUTION 
For X ~ Bln, p) 
® a Poisson approximation can be used when 7 is large (u > 50) and p is small (p < 0.1). 
Then X ~ Pop) approximately. 
® anormal approximation can be used when # and p are such that up > 5 and nq > 5. 


Then X ~ Ninp, apq) approximately. 


Example 7.17 


A number of different types of fungi are distributed at random ina field. Eighty % of these 
fungi are mushrooms, and the remainder are toadstools. Five % of the aa aa 
poisonous. A man, who cannot distinguish between mushrooms and toadstoo Ss, ae lers 
across the field and picks a total of 100 fungi. Determine, correct to two significant figures, 
using appropriate approximations, the probability that the man has picked 


COURSE IN A 


(a) at least 20 toadstools, 
(b) exactly two poisonous toadstools. 


Solution 7.17 


P(mushroom) = 0.8, P(toadstool) = 0.2, P(poisonous toadstool) = 0.05 x 0.2 = 0.01 


(a) X is the number of toadstools picked in a sample of 100. 
X ~ B(100, 0.2). 
np = 100 x 0.2 = 20, nq = 100 x 0.8 = 80 
Since mp > 5, nq > 5, use a normal approximation 


with 


mean = np = 20, 
variance = npq = 100 x 0.2.x 0.8 = 16. 


X ~ N(20, 16). 
P(X 2 20) > P(X > 19.5) 


19.5 -20 
=P 2 | 


4 
= P(Z > 0.125) 
= (0.125) 
= 0.5498 
= 0.55 (2 s.f.) 


(b) X is the number of poisonous toadstools in 100 fungi. 
X ~ B(100, 0.01) 


np = 100 x 0.01 =1 


Use a 


Poisson distribution, since 2 > 50, p < 0.1. 


X ~ Po(4) 


P(X= 


2 


1 
Serta 
2)=e 7 


= 0.1839 ... 
= 0.18 (2 s.£.) 


(C) 


Exercise 7f The normal approxim< 
L 


An ordinary unbiased die is thrown 120 times. 
Using a suitable approximation, find the 
probability of obtaining at least 24 sixes. 


. State conditions under which the distribution 


B(n, p) can be approximated by a normal 8 

distribution. 

The random variable X has the distribution 

B(25, 0.38). 

(a) Verify that the distribution can be 
approximated by a normal distribution. 

(b) Use the normal approximation to calculate 
the probability that X takes the values 15, 
16, 17, 18 or 19, 

(c) Use the normal approximation to calculate 
the probability that X takes the value 12.(C) 


o the binomial 


(b) exactly 20 in a random sample of 
100 people, 

{c) more than 200 in a random sample of 
1000 people. 


~ It is estimated that one-fifth of the population of 


England watched last year’s Cup Final on 
television. If random samples of 100 people are 
interviewed, calculate the mean and variance of 
the number of people from these samples who 
watched the Cup Fina! on television. 

Estimate, to two significant figures, the 
probability that in a random sample of 100 
people, more than 30 watched the Cup Final on 
television. (L Additional) 


9. Ina series of # independent trials, the probability 


- 10% of the chocolates produced in a factory are 


mis-shapes. A random sample of 1000 chocolates 
is taken. Find the probability that 


(a) less than 80 are mis-shapes, 

(b) between 90 and 115 (inclusive) are 
mis-shapes, 

(c) 120 or more are mis-shapes. 


. When I try to send a fax, the probability that I 


can successfully send it is 0.85. 


(a) I try to send eight faxes. Use a binomial 
model to find the probability that I can 
successfully send at least seven of the faxes. 

{b) I try to send 50 faxes. Use a normal 
approximation to the binomial model to 
find the probability that I can successfully 
send at least 45 faxes. (C) 


. Ata particular hospital, records show that each 


day, on average, only 80% of people keep their 
appointment at the outpatients’ clinic. 

Find the probability that on a day when 200 
appointments have been booked, 


11. 


(a) more than 170 patients keep their 
appointments, 

{b) at least 155 patients keep their 
appointments. 


. The random variable X is distributed 
B(200; 0.7). 12. 


Use the normal approximation to the binomial 
distribution to find 

(a) P(X > 130), 

{b) P(136 < X < 148), 

{c) P(X < 142), 

(d) P(X = 152). 


+ One-fifth of a given population has a minor eye 


defect. Use the normal distribution as an 
approximation to the binomial distribution to 
estimate the probability that the number of 
people with the defect is 


{a) more than 20 in a random sample of 
100 people, 


10. 


of a success at each trial is p. If R is the random 
variable denoting the total number of successes, 
state the probability that R = r. State also the 
mean and variance of R. 

A certain variety of flower seed is sold in packets 
containing 1000 seeds. It is claimed on the 
packet that 40% will bloom white and 60% will 
bloom red. This may be assumed to be accurate. 
Five seeds are planted. Find the probability that 


{a) exactly three will bloom white, 
(b) at least one will bloom white. 


100 seeds are planted, Use the normal 
approximation to estimate the probability of 
obtaining between 30 and 45 (inclusive) white 
flowers. 


A certain tribe is distinguished by the fact that 
45% of the males have six toes on their right 
foot. Find the probability that, in a group of 200 
males from the tribe, more than 97 have six toes 
on their right foot. 


A lorry load of potatoes has, on average, one 
rotten potato in six. A greengrocer decides to 
refuse the consignment if she finds more than 18 
rotten potatoes in a random sample of 100. Find 
the probability that she accepts the consignment. 


State conditions under which a binomial 
probability model can be well approximated by a 
normal model. 

X is a random variable with the distribution 
B(12, 0.42). 


{a} Anne uses the binomial distribution to 
calculate the probability that X < 4 and 
gives four significant figures in her answer. 
What answer should she get? 

{b) Ben uses a normal distribution to calculate 
an approximation for the probability that 
X <4 and gives four significant figures in his 
answer. What answer should he get? 

{c) Given that Ben’s working is correct, 
calculate the percentage error in his answer. 

(C) 
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THE NORMAL APPROXIMATION TO THE POISSON DISTRIBUTION 


If X follows a Poisson distribution with parameter A, i.e. X ~ Po(A), 
then E(X) = A and Var(X) = 4 
When 4 is large (say 2 > 15), the normal distribution can be used as an approximation, where 
X~N(A, A). _ 
As with the normal approximation to the binomial distribution, a continuity correction is 
needed, since you are using a continuous distribution as an approximation to a discrete one. 


Example 7.18 


A radioactive disintegration gives counts that follow a Poisson distribution with a mean count 


of 25 per second. : ; ; 
Find oe probability that in a one-second interval the count is between 23 and 27 inclusive. 


Solution 7.18 


X is the radioactive count in a one-second interval. 
X ~ Po(25) 

E(X) = 25, Var(X) =25 

Using a normal approximation, 


X ~ N(25, 25) . 
P(23 < X <27)— P(22.5 < X <27.5) (continuity correction) 
22,5 ~25 Zz 27.5 -25 
=P “es <ZK< 3 
= P(-0.5<Z<0.5) 
= 20(0.5)-1 X: 22,5 25 27.6 
= 0.383 (3 dp.) Zz: -0.5 0 0.5 


NOTE: this compares very well with the value given using the Poisson distribution. 
Check if for yourself. 


Exercise 7g The normal approximation to the Poisson distribution 
imati ber of call h eived by an 
a Ha Ca ea al . atic catctboud ‘oles: 2 Poisson distribution 


with parameter 30. Using the normal = 
approximation to the Poisson distribution, find 
the probability that, in one hour, 
edges {a} there are more than 33 calls, ‘ : 
2. If X ~ Po(35), use the normal approximation to (b) there are between 25 and 28 calls (inclusive), 
find (c) there are 34 calls. 
(a) P(X <33), 5. 
(b) P(33<X<37), 
(c) P(X >37), 
(d) P(X =37). 


3. If X ~ Po(60), use the normal approximation to 6. The number of bacteria on a plate viewed under 
find a microscope follows a Poisson distribution wit 
(a) P(SO<X<58), a parameter 60. Find the probability es there 
(b) P(S7 < X <68), are between 55 and 75 bacteria on a p' 7 C. 
{c) P(X > 52), A plate is rejected if less than 38 bacteria aS ag 
(d) P(X > 70). found. If 2000 such plates are viewed, how 
will be rejected? 


(a) P(X <25), 
(b) P(22<X<26), 


In a certain factory the number of accidents 
occurring ina month follows a Poisson 
distribution with mean 4. Find the probability that 
there will be at least 40 accidents during one year 


10. 


11. 


12, 


. In an experiment with a radioactive substance 


the number of particles reaching a counter over a 
given period of time follows a Poisson 
distribution with mean 22. Find the probability 
that the number of particles reaching the counter 
over the given period of time is 


(a) less than 22, 
(b) between 25 and 30, 
{c) 18 or more. 


. The number of accidents on a certain railway 


line occur at an average rate of one every two 
months. Find the probability that 

(a) there are 25 or more accidents in four years, 
(b) there are 30 or fewer accidents in five years, 


. The number of eggs laid by an insect follows a 


Poisson distribution with parameter 200, 


(a) Find the probability that 

(i) more than 170 eggs are laid, 

(ii) more than 205 eggs are laid, 

(iii) between 180 and 240 eggs (inclusive) 
are laid. 

(b) If the probability that an egg develops is 0.1, 
show that the number of survivors follows a 
Poisson distribution with parameter 20 and 
find the probability that there are more than 
30 survivors. 


When a trainee typist types a document the 

number of mistakes made on any one page is a 

Poisson variable with mean 3, independently of 

the number of mistakes made on any other page. 

Use tables, or otherwise, to find, to three 

significant figures, 

(a) the probability that the number of mistakes 
on the first page is less than two, 

(b) the probability that the number of mistakes 
on the first page is more than four. 


When the typist types a 48-page document the 
total number of mistakes made by the typist is a 
Poisson variable with mean 144. 

Use a suitable approximate method to find, to 
three decimal places, the probability that this total 
number of mistakes is greater than 130. (NEAB) 


Tomatoes from a particular nursery are packed 
in boxes and sent to a market. 

Assuming that the number of bad tomatoes in a 
box has a Poisson distribution with mean 0.44, 
find, to three significant figures, the probability 
of there being 

(a) fewer than two, 

{b) more than two bad tomatoes in a box 
when it is opened. 


Use a normal approximation to find, to three 
decimal places, the probability that in 50 
randomly chosen boxes there will be fewer than 
20 bad tomatoes in total. (L) 


A large silo is filled with grain harvested bya 
farmer. The grain is contaminated with insect 
pests calied weevils. The farmer finds that there 
are on average three weevils per litre of grain. 


(a) State two conditions which are necessary for 
the Poisson distribution to be a suitable 
model for the number of weevils which 
would be found in a given volume of grain, 


Assume that the Poisson distribution can be used 
in this case. 


(b) Find, to three decimal places, 

(i) the probability that 1 litre of grain 
contains at least one weevil, 

(ii) the probability that 4 litres of grain 
contain exactly ten weevils. 

{c) Use an appropriate distributional 
approximation to estimate the probability 
that 10 litres of grain contain fewer than 25 
weevils, giving your answer to three decimal 
places, (NEAB) 


13. A biologist gathers leaves of a certain plant in 


14. 


order to collect insects of a particular type. From 
past experience she knows that the distribution of 
the number of insects on 7 leaves may be modelled 
by a Poisson distribution with mean 0.81, 


(a) Calculate, to three decimal places, the 
probability that the number of insects on the 
next leaf to be examined will be fewer than 
three. 

(b) Determine, to three decimal places, the 
probability that the total number of insects 
on the next ten leaves to be examined will 
lie between six and 12 (both inclusive). 

{c) Use a distributional approximation to find, 
to two decimal places, the probability that 

the total number of insects on the next 

50 leaves to be examined will exceed 45, 

(NEAB) 


(a) Give two conditions which must apply when 
modelling a random variable by a Poisson 
distribution. 


A particular make of kettle is sold by a shop at 
an average rate of five per week. The random 
variable X represents the number of kettles sold 
in any one week and-X is modelled by a Poisson 
distribution. 
The shop manager notices that at the beginning of 
a particular week there are seven kettles in stock. 


(b) Find the probability that the shop will not 
be able to meet all the demands for kettles 
that week, assuming that it is not possible to 
restock during the week. 


In order to increase sales performance, the 
manager decides to have in stock at the 
beginning of each week sufficient kettles to have 
at least a 99% chance of being able to meet all 
demands during that week. 


(c) Find the smallest number of kettles that 
should be in stock at the beginning of each 
week, 

(d} Using a suitable approximation find the 
probability that the shop sells at least 18 
kettles in a four-week period, subject to stock 
always being available to meet demand. (L) 


Summary 
Normal variable-X 

xX ~N(u, 07). E(X) =n, Var(X).= oO: 
© Standard normal variable Z 

Z~ N(O; 1): E(Z)-= 0, Var(Z) a1) 


: Xa 
To'standardise X, use Z = aye 


e Using the standard normal tables: 


bast shade PIZ> a) = 1 oa) 


f 
‘ | 
ms I 
PZ > -a) = la) 1 PZ <a) = 1 = (a) 
| 
t 
\ 


ag “a 0 


ae Oia 
Pa <Z< b) = Bla) + (6) = 1 P([z|<a) = 20(a) = 1 


Ovarb Si 
Pla<Z<b) = &(b)- O(a) 


Using the tables in reverse: 
If O(a) = k, i.e. P(Z <a) = ky then a= O""(k) 
@...The normal approximation to the binomial distribution. 
I£X ~ Bin, p) and np > 5, ng > S then X~ N(np, npq). 
@. The normal approximation to the Poisson distribution 
If X ~Po(d) and a>15 then'X ~ N(A; A): 
A Poisson approximation to-the binomial distribution 
If X'~B(n, p) and wis large (7 > 50) and p is small (p < 0.1) then X ~ Po(np). 


@: Continuity corrections 


®. 


& 


These must be used when using a continuous distribution (e.g; normal) as'an 
approximation:to a-discrete distribution (e.g. binomial, Poisson) 


¢ 
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Miscellaneous worked examples 


Example 7.19 


A product is sold in packets whose masses are normally distributed with a mean of 1.42 kg 
and a standard deviation of 0.025 kg. 


(a) Find the probability that the mass of a packet, selected at random, lies between 1.37 kg 
and 1.45 kg. 


(b) Estimate the number of packets, in an output of 5 000, whose mass is less than 1.35 kg. (C) 


Solution 7.19 


X is the mass, in kilograms, of a packet. 
X ~N (1.42, 0.0252) 


(a) P(1.37<X < 1.45) 


X-1.42 
0.025 parr 
fc 1.42 1.45 — | 


Standardising, using Z = 


0.025. ~~~ 0,025 


=P(-2 <Z< 1.2) 

= 0(1.2)+ 6(2)~1 
= 0.8849 + 0.9772 -1 m ay ao Las 
= 0.8621 z: 2 0 12 


The probability that the mass lies between 1.37 kg and 1.45 kg is 0.86 (2 s.f.). 
1.35 - a 


I 
J 
i 
I 
I 
[ 
I 


0,025 
=P(Z <-2.8) 
=1- (2.8) 
=1-0,9974 
= 0,0026 
Since there are 5000 packets, multiply the probability by 5000. 
5000 x 0.0026 = 13 
13 packets have a mass less than 1.35 kg. 


(b) P(X < 1.35) (2 < 


Example 7.20 


In a certain cross country running competition the times that each of the 136 runners took to 
complete the course were recorded to the nearest minute. The winner completed the course in 
23 minutes and the final runner came in with a time of 78 minutes. The full results are 
summarised in the table below. 


Recorded time 20=29 30-39. 40-49 50-59 60-69 70-79 


Frequency: 7. 21 42 37: 20 9: 


elie De THE MORMAL DISTR 
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(a) Use linear interpolation to estimate the median time. (d) Assuming = f Z 6 os 5 a 58.1 
: = 3 (40. : 
The upper and lower quartiles of the time taken are 58.1 and 40.9 respectively. - | a58 
(b) On graph paper, draw a box and whisker plot for the results ee this aaa You ies thc pecan Gin innit S Zz 
. ‘ r is : = 
should mark the end points, the median and the quartiles clearly on your diag} \ X ~N(49.5,02) and P(X < 58.1) =0.75 Sagat ater Gua 
(c) Comment on the skewness. | : ae ca a as 7 0 : 
; follows a normal | The central 50% of the distribution is enclosed between OQ, and Q,, so the z value for Q; 
Assume that the time taken by the runners to complete the course t is é Q, and Q,, so th 
distribution with the values for the quartiles as given above. i corresponds to an upper tail sola . i , Le. a lower tail probability of 0.75. 
(d) Calculate the mean of this normal distribution. eae (L) i “. P(Z<z) =0.75 where z=————— > es a 
(e) Calculate the standard deviation of this normal distribution. ®(z) = 0.75 
z=07(0.75) 
Solution 7.20 ed = 0.674 
Recorded time <29.5 <39.5 <49.5 <59.5 <69.5 <79.5 | te a 0.674 
136 6 
Cumulative frequency 7 28 70 107 127 we ae 
(a) For grouped data, the median, Q,, is the the Inth value, i.e. Q, is the 68th value. This lies =12.75... 
in the interval 39.5 — 49.5. The standard deviation is 12.8 minutes (1 d.p.). 
This interval has 42, items in it and is of width 10. ae ; 
Q 49.5 
er ee Example 7.21 
28 Machine A, used for filling bags with ground coffee, can be set to dispense any required mean 
—— weight of coffee per bag. At any setting the weight of coffee in a bag can be modelled by a 
ooh ete normal distribution with a standard deviation of 1.95 g. 
(a) If the machine is set to dispense a mean weight of 128 g of coffee per bag, calculate the 
Sa i percentage of bags that contain less than 125 g. 
‘ d (b) To meet an official regulation the setting on a machine must be adjusted so that no more 
Q, = 39.5 +4 10= 49.0 (1 dp.) than 1% of bags contain less than 125 g. 
The median time is 49.0 minutes. (i) Calculate the smallest mean weight to which machine A should be set to meet the 
3 a Q regulation. 
(b) 4 (ii) Machine B will only just meet the regulation when it is set to dispense a mean weight 
jo 1 of 128.5 g. Assuming that the weight of coffee is a bag filled by Machine B can be 
i modelled by a normal distribution, calculate the standard deviation of this 
7 + . 
1 T y y y 70 80 distribution. (NEAB) 
20 cs "0 = a Time (min) 


Solution 7.21 
(c) The distribution appears to be spamcteteal X is the weight, in grams of coffee in a bag from Machine A. 
2 
Q,-OQ = 58.1-49.0=9.1 X ~ N(w, 1.957). 
©, -Q; = 49.0 - 40.9 = 8.1 : (a) = 128, so X ~ N(128, 1.957). 
So O3- Q) > Qs ~ Qi, but only just. 125-128 
The distribution has a slight positive skew. 1.95 
= P(Z <-1.538) 
= 1- (1.538) 
=1-0.9380 
= 0.062 
=6.2% 
6.2% of bags contain less than 125 g. 


P(X <125)=PIZ < 


(b) X ~ Nl, 1.952) and P(X < 125) < 0.01 
(i) Standardising, you need to find z such that 


P(Z<z) =0.01 
ie. ®(z) =0.01 
so @(-z) = 0.99 


-z = ©74(0.99) 
= 2.326 
g =-2.326 
125- 
HIS 31936 
1.95 


125 -w< - 2,326 x 1.95 
me 125 + 2.326 x 1.95 
we 129.53... 


The smallest mean weight is 129.5 g (1 d.p.). 
Y is the weight, in grams of coffee in a bag from machine B. 


Y ~ N(128.5, 07) and P(Y < 125) = 0.01 


(ii 


Standardising: . 
125 ~ 128.5 
P(Z<z)=0.01 where Z = =. 
ane % 126 «1285 
o Z  ~-2.326 0 
From part (i) z = —2.326 
-3.5 
~2.326 =-—— 
o 
_ 35 
© 2.326 
= 1.504. 


The standard deviation is 1.5 g (1 d.p.). 


Example 7.22 


It is estimated that, on average, one match in five in the Football League is drawn, and that 
one match in two is a home win. 


(a) Twelve matches are selected at random. Calculate the probability that the number of 
drawn matches is 
(i) exactly three, 
(ii) at least four. 


(b) Ninety matches are selected at random. Use a suitable approximation to calculate the 
probability that between 13 and 20 (inclusive) of the matches are drawn. 


{c) Twenty matches are selected at random. The random variables D and H are the numbers 
of drawn matches and home wins, respectively, in these matches. State, with a reason, 
which of D and H can be better approximated by a normal variable. 


Solution 7.22 


X is the number of drawn matches in 12. 
X ~ B(12, 0.2) since P(draw) =4= 0.2 


(a} (i) P(X =3) = C,(0.8)(0.2)? 
= 0.24 (2 s.f.) 
(ii) P(X>4) =1-P(X <4) 
= 1 ~ ((0.8)!? + 12(0.8) (0.2) + C,(0.8)9(0.2)? + 2.€,(0.8)%(0.2)?) 
=1-0.,794... 
= 0.21 (2 sf.). 
(b) X is the number of drawn matches in 90. 
Then X ~ B(u, p) with n = 90, p = 0.2, q = 0.8 
Now np = 90 x 0.2 = 18, nq = 90 x 0.8 = 72 
Since mp > 5, nq > 5, use a normal approximation 
X ~ N(np, npq) with np = 18, and npq = 90 x 0.2 x 0.8 = 14.4, 
so X ~ N(18, 14.4). 
P(13 <X <20) > P(12.5 < X<20.5) continuity correction) 
12.5 -18 20.5 - 18 
irra <Z< rrr 
= P(-1.449<Z<0.659) 


P 


= &(1.449) + &(0.659) -1 7 PAE Sane 
= 0.9264 + 0.7451 -1 Z 1.449 0 0.669 
= 0.67 (2 s.f.) 


D is the number of drawn matches. 


D~B(20,0.2) p=20x0.2=4,s0np<S. 


{c 


H is the number of home wins. 
H~B(20,0.5) mp =20x0.5=10>5,nq=10>5. 


For H, np > 5 and ng > 5, so H can be better approximated by a normal variable. 


Miscellaneous exercise 7h 


1. Squash balls, dropped onto a concrete floor from 


a given point, rebound to heights which can be 
‘modelled by a normal distribution with mean 
0.8 mand standard deviation 0.2 m. The balls 
are classified by height of rebound, in order of 
decreasing height, into these categories: Fast, 
Medium, Slow, Super-Slow and Rejected. 


(a) Balls which rebound to heights between 
0.65 m and 0.9 m are classified as Slow. 
Calculate the percentage of balls classified as 
Slow. - 

(b) Given that 9% of bails are classified as 
Rejected, calculate the maximum height of 
rebound of these balls. : 

(c) The percentage of balls classified as Fast and 
as Medium are equal, Calculate the 
minimum height of rebound of a ball 
classified as Fast, giving your answer correct 
to two decimal places. (C) 


. The mass of grapes sold per day in a 
supermarket can be modelled by a normal _ 
distribution. It is found that, over a long period, 
the mean mass sold per day is 35.0 kg, and that, 
on average, less than 15.0 kg are sold on one day 
in twenty. 


(a) Show that the standard deviation of the 
mass of grapes sold per day is 12.2 kg, 
correct to three significant figures. 

(b) Calculate the probability that, on a day 
chosen at random, more than 53.0 kg are 
sold. 

(c) Ten days are chosen at random. Assuming 
independence, find the probability that less 
than 15.0 kg will be sold on exactly two of 
these days. (C) 


. (a) Give two reasons why the normal 
distribution is important in statistics. 

(b) An airline has a regular flight from one 
airport to another. The airline models the 
duration of a flight as a normally distributed 
random variable with a mean of 246 
minutes and a standard deviation of five 
minutes. Use this model to calculate, to one 
decimal place, the percentage of these flights 


that are completed in less than four hours. 
r : (NEAB) 


. The random variable X is normally distributed 
with mean ys and variance 0”. 


Given that P(X > 58.37) = 0.02 
and P(X < 40.85) = 0.01 
find sand @. (L) 


. Alan is a member of an athletics club. In long 


jump competitions, his jumps are normally 
distributed with a mean of 7.6 m and a standard 
deviation of 0.16 m. 
(a) Calculate the probability of him jumping 
(i) more than 8.0 m, 
(ii) between 7.50 m and 7.75 m. 
(b) Determine the distance exceeded by 75% of 
his jumps. 
Brian also belongs to the athletics club. In long 
jump competitions, his jumps are normally 
distributed with a mean of 7.45 m and 95.2% of 
them exceed 7.0 m. 
{c) Calculate, correct to two decimal places, the 
standard deviation of Brian’s jumps. 
The athletics club has to select either Alan or 
Brian to be its long jump competitor at a major 
athletics meeting, In order to qualify for the final 
rounds of jumps at the meeting, it is necessary to 
achieve a jump of at least 8.0 m in the 
preliminary rounds, 
(d) State, with justification, which of the two 
athletes should be selected. (NEAB) 


. The time required to complete a certain car 


journey has been found from experience to have 
mean 2, hours 20 minutes and standard deviation 
15 minutes. 


(a) Use a normal model to calculate the 
probability that, on one day chosen at 
random, the journey requires between 

hour 50 minutes and 2 hours 40 minutes. 

(b) It is known that delays occur rarely on this 

journey, but that when they do occur they 

are lengthy. Give a reason why this 
information suggests that a normal 

distribution might not be a good model. {C) 


. A machine is producing a type of circular gasket. 


The specifications for the use of these gaskets in 
the manufacture of a certain make of engine are 
that the thickness should lie between 5.45 mm 
and 5.55 mm, and the diameter should lie : 
between 8.45 mm and 8.54 mm. The machine is 
producing the gaskets so that their thicknesses 
are N(5.5, 0.0004), that is, normally distributed 
with mean 5.5 mm and variance 0.0004 mm*, 
and their diameters are independently distributed 
N(8.54, 0.0025). 
Calculate, to one decimal place, the percentage 
of gaskets produced which wil! not meet 


(a) the specified thickness limits, 
(b) the specified diameter limits, 
{c) the specifications. 
Find, to three decimal places, the probability 
that, if six gaskets made by the machine are 


chosen at random, exactly five of them will ait 


the specifications. 


8. A machine is used to fill tubes, of nominal 
content 100 ml, with toothpaste. The amount of 
toothpaste delivered by the machine is normally 
distributed and may be set to any required mean 
value, Immediately after the machine has been 
overhauled, the standard deviation of the 
amount delivered is 2 ml. As time passes, this 
standard deviation increases until the machine is 
again overhauled. The following three conditions 
are necessary for a batch of tubes of toothpaste 
to comply with current legislation: 


I the average content of the tubes must be at 
least 100 mi, 

It not more than 2.5% of the tubes may 
contain less than 95.5 ml, 

Il not more than 0.1% of the tubes may 
contain less than 91 ml. 
(©! (0.999) = 3.09) 


(a) For a batch of tubes with mean content 
98.8 ml and standard deviation 2 ml, find 
the proportion of tubes which contain 
(i) less than 95.5 ml, 

(ii) less than 91 ml. 


Hence state which, if any, of the three 
conditions above are not satisfied. 


{b) If the standard deviation is 5 ml, find the 
mean in each of the following cases: 
(i) exactly 2.5% of tubes contain less than 
95.5 ml, 
(ii) exactly 0.1% of tubes contain less than 
91 mi. 


Hence state the smallest value of the mean 
which would enable all three conditions to 
be met when the standard deviation is 5 ml. 


(c) Currently exactly 0.1% of tubes contain less 
than 91 ml and exactly 2.5% contain less 
than 95.5 ml. 

(i) Find the current values of the mean and 
the standard deviation. 

(ii) State, giving a reason, whether you 
would recommend that the machine is 
overhauled immediately. (AEB) 


. A wholesaler buys cauliflowers from a farmer for 
distribution to retail greengrocers. 
The wholesaler classifies the lightest 15% of 
cauliflowers as small, the heaviest 25% as large, 
and the rest as medium. 


{a) Given that the wholesaler makes a profit of 
2 pence on each small cauliflower, 12 pence 
on each medium one and 27 pence on each 
large one, calculate the wholesaler’s mean 
profit per cauliflower. 


The weights of the cauliflowers can be modelled 


by a normal distribution with a mean of 628 g 
and a standard deviation of 160 g. 


(b) Calculate the weight that a cauliflower must 
exceed to be classified as large. 

(c) Calculate the weight that a cauliflower must 
fall below to be classified as small. (NEAB) 


10. In 1994 an insurance company received claims 


11. 


12. 


135 


from 20% of the motorists it had insured. 


(a) Fora random sample of 14 motorists 
insured with the company in 1994, find the 


probability that 
{i) exactly three claimed on their 
insurance, 


(ii) between two and five inclusive claimed 
on their insurance, 

(iii) a majority claimed on their insurance. 
(b) For a random sample of 90 motorists 

insured with the company, use an 

appropriate approximating distribution to 

determine the probability that at least 25 

claimed on their insurance in 1994. (NEAB) 


A horticulturalist knows from experience that 
when taking cuttings from bay trees only 15 in 
every 100 successfully take root. 


(a) In a batch of ten randomly selected cuttings, 
find the probability that 
(i) none of the cuttings take root, 

(ii) fewer than three of the cuttings take 
root. 

(b) Let 2 be the smallest number of cuttings 
which need to be examined before there is at 
least a 95% chance that one or more of 
them will have taken root. 

(i) Show that m satisfies (0.85)" < 0.05, 
(ii) Given that (0.85)'” = 0.0631, find the 
value of n. 

{c) Using a suitable approximation, estimate the 
probability that fewer than six in a batch of 
50 cuttings take root. {L) 


A large bag of seeds contains three varieties in 
the ratios 4: 2:1 and their germination rates are 
50%, 60% and 80% respectively. 

Show that the probability that a seed chosen at 
random from the bag will germinate is 4, 

Find, to three decimal places, the probability that 
of four seeds chosen at random from the bag, 
exactly two of them will germinate. 

Given that 150 seeds are chosen at random from 
the bag, estimate, to three decimal places, the 
probability that fewer than 90 of them will 
germinate. (L) 


A building society announces its intention to 
convert to a bank. During the first day following 
the announcement, the number of calls per 
minute answered by the society’s hotline may be 
modelled satisfactorily by a Poisson distribution 
with mean 12. 


{a) Calculate the probability that the hotline 
answers more than ten calls in a one-minute 
period. 

(b) Estimate the probability that the hotline 
answers fewer than 700 calls in one hour. 

(NEAB) 


4. (a) A trade union asked 300 of its members 
whether they were full-time workers or 
part-time workers, and the number of hours 
they worked in a particular week. The table 
below shows an analysis of this survey. 


Mean Standard 
Number::.:; number: of deviation of 
workers hours worked. hours worked 
Full-time 100 40 4.5 
Part-time 200 20 6.9 


The hours, both for the full-time workers 

and for the part-time workers, are normally 

distributed. 

{i) Calculate the total number of workers 
who worked more than 32 hours. 

(ii) Given that only 6% of the full-time 
workers worked for less than T, hours, 
calculate T;. 

(iii) Given that only 3% of the part-time 
workers worked for more than T, 
hours, calculate T,. 

(b) A set of numbers is normally distributed; 

1.5% of the numbers exceed 1434 and 

16.6% of the numbers exceed 1194. 

Calculate the mean and the standard 

deviation of the distribution. (C) 


15. During an advertising campaign, the 
manufacturer of Wolfitt (a dog food) claimed 
that 60% of dog owners preferred to buy 
Wolfitt. Assuming that the manufacturer’s claim 
is correct for the population of dog owners, 
calculate 


(a) using the binomial distribution, and 

(b) using a normal approximation to the 
binomial; 

the probability that at least six of a random 

sample of eight dog owners prefer to buy 

Wolfitt. 

Comment on the agreement, or disagreement, 

between your two values. Would the agreement 

be better or worse if the proportion had been 

80% instead of 60%? 

Continuing to assume that the manufacturer’s 

figure of 60% is correct, use the normal 

approximation to the binomial to estimate the 

probability that, of a random sample of 100 dog 

owners, the number preferring Wolfitt is between 

60 and 70 inclusive. (MEI) 


16. Six hundred rounds are fired from a gun at a 
horizontal target 50 m long which extends from 
950 m to 1000 m in range from the gun. 

The trajectories of the rounds all lie in the 
vertical plane through the gun and the target. It 
is found that 27 rounds fall short of the target 
and 69 rounds fall beyond it. 


Assuming that the range of rounds is normally 
distributed, find the mean and standard 
deviation of the range. 

Estimate the number of rounds falling within 

5 m of the centre of the target. (C} 


17. A traffic survey is being undertaken on a main 
road to determine whether or not a pedestrian 
crossing should be installed. On five successive 
days, from Monday to Friday, the hour between 
8 a.m. and 9 a.m. was split up into 30-second 
intervals, and the number of vehicles passing a 
certain point in each of these intervals was 
recorded. 

The random variable X represents the number of 
cars travelling from the town centre per 
30-second interval. For the 600 observations 

the mean and variance were 3.1 and 3.27 
respectively. 


(a) Explain why X might be modelled by a 
Poisson distribution. 

(b) Using the sample mean as an estimate for 
the Poisson parameter, calculate the 
probability of recording exactly three 
vehicles travelling from the town centre in a 
30-second interval. 

{c) Calculate the probability of recording at 
least six vehicles travelling from the town 
centre in a 60-second interval. 


The mean number of vehicles per 30-second 
interval passing the survey point travelling 
towards the town centre during the same survey 
period was 7.9. 


(d) Show that there is roughly a 12% chance 
that the total number of vehicles passing per 
30-second interval is ten. 

(e) Using a suitable approximation, estimate the 
probability of between 16 and 24 vehicles 
(inclusive) passing the survey point in a 
60-second interval. (MEI) 


18. [In this question give three places of decimals in 
each answer. 
When a telephone call is made in the country of 
Japonica, the probability of getting the intended 
number is 0.95. 


(a) Ten independent calls are made. Find the 
probability of getting eight or more of the 
intended numbers. Find also the conditional 
probability of getting all ten intended 
numbers given that at least eight of the 
intended numbers are obtained. 

(b) Three hundred independent calls are made. 
Find the probability of failing to get the 
intended number on a least ten but not more 
than twenty of the calls. 

{c) Four hundred independent calls are made. 
For each call the probability of getting 
‘number unobtainable’ is 0.004. Find the 
probability of getting ‘number unobtainable’ 
fewer than three times. i) 


19. An old oe Hi never garaged at night. On the 
morning following a wet night, ili 
that the car does ne start oe He viehetaiey 
On the morning following a dry night, this 
probability is 35. The starting performance of the 
car each morning is independent of its 
performance on previous mornings. 


(a) There are six consecutive wet nights, 
Determine the probability that the car does 
not start on at least two of the six mornings 

(b) During a wet autumn there are 32 wet ; 
nights. Using a suitable approximation, 
determine the probability that the car does 
not start on fewer than 16 of the 32 
mornings. 

(c) During a long summer drought there are 
100 dry nights. Using a Poisson 
approximation, determine the probability 
that the car does not start on five or more of 
the 100 mornings. 


{Give three decimal places in your answers.) (C) 


20. The life, in years, of a randomly chosen Flashpan 
car battery is normally distributed with mean 2 
a standard deviation 0.4. 
ow that the probability that a randomly ch 
y chosen 
Flashpan battery has a life less than one year is 
0.006 21, correct to five places of decimals. 


(a) A farmer buys two randomly chosen Flashpan 
batteries. Find the probability that the 
batteries each have a life more than one year. 


Mixed test 7A 
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(b) A wholesaler buys 500 randomly chosen 
Flashpan batteries. Using a suitable 
approximation, find the probability that at 
most three have lives each Jess than one 
year. 

(c) A retailer buys ten randomly chosen 
Flashpan batteries. Find the probability that 
at least four have lives each exceeding two 
years, (C) 


Describe, briefl y, the conditions under which the 
binomial distribution Bin(n, p) may be 
approximated by 


(a) anormal distribution, 
(b) a Poisson distribution, 


giving the parameters of each of th i 
distributions. sania 
Among ae blood cells of a certain animal 
species, the proportion of cells which are of type 
Ais 0.37 and the proportion of celis which ws of 
type B is 0.004. Find, to three decimal places, the 
probability that in a random sample of eight 
blood cells at least two will be of type A. 

Find, to three decimal places, an approximate 
value for the probability that 


(c) ina random sample of 200 blood cells the 
combined number of type A and type B cells 
is 81 or more, 

(d) there will be four or more cells of type B in 
a random sample of 300 blood cells. (L) 


1. A smoker’s blood nicotine level, measured in 
ng/ml, may be modelled by a normal random 


Mera with mean 310 and standard deviation 


(a) What proportion of smokers have blood 
nicotine levels lower than 250? 

{b) What blood nicotine level is exceeded by 
20% of smokers? {AEB) 


2. The number of hours of sunshine at a resort has 
been recorded for each month for many years. 
One year is selected at random and H is the 
number of hours of sunshine in August of that 
year. H can be modelled by a normal variable 
with mean 130. 


(a) Given that P(H < 179) = 0.975, calculate the 
standard deviation of H. 
(b) Calculate P(100 < H < 150). (C) 


3. Ina large university 90% of th 
iat y of the students are 


{a} Show that the probability that in a random 
sample of eight students exactly six will be 
right-handed is approximately 0.149. 


(b) Find ba ered that in a random 
sample of 20 student: : i 
ie: s fewer than 15 will be 

(c) Determine, to two decimal places, an 
approximate value for the probability that 
in a random sample of 200 students at most 
184 will be right-handed. (NEAB) 


» The random variable X represents the weight, in 


grams, of chocolate chips in packets sold by a 
supermarket, It is suggested that X can be 
modelled by a norma! distribution with 

X ~ N(100, 25), 

(a) Find P(X > 108). 

(b) Show that P(| X — 100 |< 6.8) = 0.8262, 


Three packets are selected at random from the 


packets of chocolate chips on the supermarket 
shelf, 


{c) Find the probability that exactly two of 
them will have weights in the range 
|X -100|<6.8, 

(d} Comment on the suitability of the normal 
distribution as a model for X. (L) 
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1, The area that can be painted using one litre of 
Luxibrite paint is normally distributed with 
mean 13.2 m? and standard deviation 0.197 m”. 
‘The corresponding figures for one of Maxigloss 
paint are 13.4 m? and 0,343 m/?, it is required to 
paint an area of 12.9 m’. Find which paint gives 
the greater probability that one litre will be 
sufficient, and obtain this probability. {C) 


2. Soup is sold in tins which are filled by a machine. 
The actual weight of soup delivered to a tin by 
the filling machine is always normally distributed 
about the mean weight with a standard deviation 
of 8 g. The machine is set originally to deliver a 
mean weight of 810 g. 


(a) Determine the probability that the weight of 
soup ina tin, selected at random, is jess than 
800 g. 
(b) Determine the probability that the weight of 
soup ina tin, selected at random, is between 
795 g and 820 g. 


Proposed legislation requires that not more than 
2.5% of tins may contain less than the nominal 
net weight of 800 g. 


{c) Assuming that the value of the standard 
deviation remains unchanged, determine the 
minimum mean weight that the machine 
should be set to deliver in order to comply 
with this requirement. (NEAB) 


3. Consultants employed by a large library reported 
that the time spent in the library by a user could 
be modelled by a normal distribution with mean 
65 minutes and standard deviation 20 minutes. 


(a) Assuming that this model is adequate, what 
is the probability that a user spends 
(i) less than 90 minutes in the library, 
(ii) between 60 and 90 minutes in the 
ibrary? 


The library closes at 9.00 p.m. 


(b) Explain why the model above could not 
apply to a user who entered the library at 
8.00 p.m. 

(c) Estimate an approximate latest time of entry 
for which the model above could still be 


plausible. (AEB) 


4. Frugal Bakeries claim that packs of ten of their 


buns contain on average 75 raisins. A Poisson 
distribution is used to model the number of 
raisins in a randomly selected bun. 


(a) Specify the value of the parameter. 

(b) State any assumption required about the 
distribution of raisins in the production 
process for this model to be valid. 

(c) Show that the probability that a randomly 
selected bun contains more than eight raisins 
is 0.338. 

(d) Find the probability that in a pack of ten 
buns at least two buns contain more than 
eight raisins. 

{e) Using a suitable approximation, find the 
probability that in a pack of ten buns there 
are more than 80 raisins. {L) 


. An engineering firm sets an aptitude test when 


applicants first apply for training. The times 
taken to complete the test are normally 
distributed with mean 40.5 minutes and standard 
deviation 7.5 minutes. Applicants who complete 
the test in less than 30 minutes are immediately 
accepted for training. Those who tale between 
30 and 36 minutes are required to take a further 
test. All other applicants are rejected. 


(a) For a randomly chosen applicant calculate 
the probability of 
(i) immediate acceptance for training, 

(ii) requirement to take a further test. 

(b) Given that a randomly chosen applicant was 
not rejected after this first test, calculate, to 
three decimal places, the probability that the 
applicant was immediately accepted for 
training. 

(c) Ona certain occasion there were 100 
applicants. Use a suitable distributional 
approximation to calculate the probability 
that more than 25 applicants were required 
to take a further test. (NEAB) 


Linear combinations of normal variables 


In this chapter you will learn about the distributions for 


e the sum of independent normal variables 
e the difference of independent normal variables 
e multiples of independent normal variables 


You will need the following results, first introduced on pages 256 and 257. 


If X and Y are ar y two random va iables, discrete or continuous, and a and 6 are any two 


Sums 


Differences 
E(X + Y) = E(X) + E(Y @®@ E(X-Y)=E(X)-E 
<ve es -E(Y = 
E(aX + bY) = aE(X) + bE(Y) -@  ElaX - bY) =aE(X) wer - 
Also, if X and Y are independent, then 
Var(X + Y) = Var(X) + Var(Y) «»@ — Var(X— Y) = Var(X) + Var(Y) ® 


Var(aX + bY) =a? Var(X) +b? Var(Y)...@ — Var(aX ~ bY) =a? Var(X) + b? Var(Y) ... 
I 


{Remember the 


sign here} 


THE SUM OF INDEPENDENT NORMAL VARIABLES 


Consider this example which involves the sum of independent normal variables. 


Example 8.1 


A wee machine is installed in a students’ common room. It dispenses white coffee by first 
teleasing a quantity of black coffee, normally distributed with mean 122.5 ml and standard 


eviation 7.5 mi, and then ad ing a quantity of milk, normally puted with mi 
d t 7 dt! d qi tity of ik, normally distributed with mean 30 ml 


Each cup is marked to a level of 137.5 ml and if thi i i : 
ied ser ase .5 ml and if this level is not attained the customer receives 


What percentage of cups of white coffee will be given free of charge? 


Solution 8.1 

B is the amount, in millilitres, of black coffee, where B ~ N(122.5, 7.57). 
Mis the amount, in millilitres, of milk, where M ~ N(30, 5). 

Band M are independent normal variables. | 
made by combining the black coffee 


Consider W, the amount, in millilitres, of white coffee, 
and milk, so W=B+ Mand 


E(W) = E(B) + E(M) = 122.5 + 30 = 152.5 
Var(W) = Var(B) + Var(M) = 7.57 + 5? = 81.25 (usiz 


So W=B+M has a mean of 152.5 and a variance of 81.25. 


al variables, it is true that the sum of these variables is also normally 


Result 1 above} 


For independent norm: 

distributed, so 
B+M~N(152.5, 81.25) i 

ice. W ~ N(152.5, 81.25) 


The drink is free of charge if W < 137.5 
1375°= =| 


P(W < 137.5) =P{Z < 
| 81.25 


if 
i 
| 
4 
I 
| 
i 


= P(Z <-1.664) 

=1- (1.664) 7 

=1-0.9519 Ww: 137.5 152.5 
= 0.0481 Zz: -1.664 0 
=481% 


So approximately 5% of the cups of white coffee will be given free of charge. 


In general 
EX ~ Niu, ¢ 
then X+ Y¥~ Nw, 
Xx X 


‘This result can be extended to any set of independent normal variables X,, Xy, -... X, 


and ¥~ Nien, 02°) 
$y 07 


where, with obvious notation 


Xt Xp +X, ~ Ney tag bo +My Ot Op t- +0,)) 


Example 8.2 
Four runners, Andy, Bob, Chris and Dai, train to take part in a 1600 m relay race in which 


Andy is to run 100 m, Bob 200 m, Chris 500 m and Dai 800 m. 
During training their individual times, recorded in seconds, follow normal distributions. 


With obvious notation, these are: 
A~N(10.8, 0.27), B ~ N(23.7, 0.37), C~N(62.8, 0.97) and D ~ N(121.2, 2.17). 


Find the probability that they run the relay race in less than 3 minutes 35 seconds. 


Solution 8.2 


Let T be the total time, in seconds, f 
fee 8, for the relay race. 
E(T) = E(A) + E(B) + E(C) + E(D) 
= 10.8 + 23.7 + 62.8 + 121.2 
= 218.5 
Var(T) = Var(A) + Var(B) + Var(C) + Var(D) 
= 0.27 + 0.37 + 0.97 +2.12 
=5.35 


T ~ N(218.5, 5.35) 


To find the probability that the total time is less than 3 minutes 35 seconds, i.e. 215 seconds 


find P(T < 215) mi < Se 
5.35 
= P(Z <-1.513) 
=1-9(1.513) 
=1~0.9349 4 
= 0.0651 T 2.15 218.5 
Zz -1.513 0 


The probability that the runners take less than 3 minutes 35 seconds is 0.065 (2 s.f.) 


Consider now the special case when . are n independent observations fro ‘he 
1 hen X,, X 4 
‘ps > > 1 Ti 
1 2: n ‘p' ni mt. 


so X,;~N(u, 07), X,~Nluo7), 2, X,~N(u, 02) 


then E(X, +X +--+ X,) = E(X;) + E(X,) + + B(X,) 
SUt hte te 
m5 


Var(X, +X, +++. +X,) = Var(X,) + Var(X,) +-+- +Var(X,) 
=07 +07 +0 402 . 
=no? 


So X, +X, 400 4+X,~ Nine, no*} 


Example 8.3 
Masses of a particular article are normally distributed with mean 20 g and standard deviation 


2g. A random sample of 12 i i i ili 
a eerie s of 12 such articles is chosen. Find the probability that the total mass is 


Solution 8.3 


X is the mass, in grams, of an article. 
X ~ N(20, 27). 


Let T= X +X, ¢0°4+Xy 
then E(T) = E(X,) + E(X,) ++ + E(X)) 


=12E(X) 
= 240 
Var(T) = Var(X ,) + Var(X,) +++ + Var(X 4) 
= 12Var(X) 
=48 
So T ~ N(240, 48). 
Zz 230-240 
= PZ > —=— 
P(T'> 230) | a : 
= P(Z>-1.443) ‘ oa 
: 2 
ae 5 14430 


The probability that the total mass is greater than 230 g is 0.93 (2 s.f.}. 


Example 8.4 7 
The maximum load a lift can carry is 450 kg. The weights of men are normally See 
ith mean 60 kg and standard deviation 10 kg. The weights of bicaazar cm etl 
di tributed with mean 55 kg and standard deviation 5 kg. Find the pro ility a 
be overloaded by five men and two women, if their weights are independent. 


Solution 8.4 s 
Let M be the weight, in kilograms, of a man. Then M ~ NG. by 
Let W be the weight, in kilograms, of a woman. Then W ~ york 
The lift will be overloaded if 
M,+M,+M;+M,+Ms+W,+ W, > 450. 
Let T=M,+M,+--°-+Ms+ W,+ Wy 
E(T) = SE(M) + 2E(W) 
= 300+ 110 
=410 
Var(T) = 5 Var(M) + 2 Var(W) 
= 500+ 50 
= 550 
Since M and W are normally distributed, T is also normal. 
So T~N(410, 550). 
P(lift is overloaded) = P(T > Meh As 
V¥550 
= P(Z > 1.706) 
= 1- (1.706) r 4 
= 0.0441 : 
The probability that the lift will be overloaded by five men and pat sone 


12 > 


0 450 
1.706 


0.044 (2 sf) 


al 


1 
1 
1 
| 
i 
| 
' 
+ 
i 
ie) 


THE DIFFERENCE OF INDEPENDENT NORMAL VARIABLES 


For two independent variables X and Y, where X ~ N(u1,0,2) and Y ~ N(it, 077) 
E(X - Y) = E(X) - E(Y) =", -n, R 
Var(X — Y) = Var(X) + Var(Y) = ofptoe & 


X— Y is normally distributed, so 


X~¥~ Niu, ~ jy, 07 +07) 


(Remember the + sige 


Example 8.5 


A machine produces rubber balls whose diameters are normally distributed with mean 

5.50 cm and standard deviation 0.08 cm. 

The balls are packed in cylindrical tubes whose internal diameters are normally distributed 
with mean 5,70 cm and standard deviation 0.12 cm. 

If a ball, selected at random, is placed in a tube, selected at random, what is the distribution of 
the clearance? (The clearance is the internal diameter of the tube minus the diameter of the ball.) 
What is the probability that the clearance is between 0.05 cm and 0.25 cm? 


Solution 8.5 


Let B be the diameter, in centimetres, of a rubber ball. Then B ~ N(5.50, 0.08) 
Let T be the internal diameter, in centimetres, of a cylindrical tube. Then T ~ N(5.70, 0.122) 


Let C be the clearance, in centimetres, so C= T—B 


E(C) = E(T) - E(B) = 5.70 ~ 5.50 =0.2 
Var(C) = Var(T) + Var(B) = 0.082 + 0.122 = 0.0208 


so C ~ N(0.2, 0.0208) 


To find the probability that the clearance is between 0.05 cm and 0.25 cm, find 
0.05 - 0.2 0.25 - 0.2 » 


< < 
V0.0208 V0.0208 
= P(-1.040 < Z < 0.347) 
= (1.040) + (0.347) —1 : 
= 0.8508 + 0.6357 —1 PS 


t 
a Tr 0.05 0.2 0.28 
= 0.4865 z: “1.040 0 0.347 


‘The probability that the clearance is between 0.05 om and 0.25 cm is 0.49 (2 s.f.) 


P(0.05 <C< 0.25) =P 


Example 8.6 


A certain liquid drug is marketed in bottles containing a nominal 20 ml of drag. Tests 
on a large number of bottles indicate that the volume of liquid in each bottle is 
distributed normally with mean 20.42. ml and standard deviation 0.429 ml. 

If the capacity of the bottles is normally distributed with mean 21.77 ml and standard 
deviation 0.210 mi, estimate what percentage of bottles will overflow during filling. 


408 & 


Solution 8.6 
X is the volume, in millilitres, of liquid and X ~ N( 


The bottle will overflo 


20.42, 0.4297). 


2 
Y is the capacity, in millilitres, of a bottle and Y ~ N(21.77, 0.210°). io 
w if the quantity of liquid is greater than the capacity of the bottle, 


ie. if X>YsoX-Y>0O | 
Let D=X-Y | 
E(D) = E(X) - E(Y) = 20.42. = 21.77 = -1.35 | 
Var(D) = Var(X) + Var(Y) = 0.429? + 0.210? = 0.2281 | 
D ~N(-1.35, 0.2284) 


1 
plz Sr i 
z 5 2S 
a | ¥0.2281 I 
= P(Z > 2.827) ! ) 
= 1- (2.827) b = 
=1-0.9976 yl ae we 
= 0.0024 


0.24% of bottles will overflow during filling. 


Example 8.7 


i i i i i i ‘ ions. The 
Ina cafeteria, baked beans are served either in ordinary oars or in apn aa 
‘ . . 
ity gi i ion i ormal variable with mean 90 g a 
uantity given for an ordinary portion is a nc riabl af ia 
aeatee’ g and the quantity given for a child’s portion is a normal variable with mean 43 g 


dard deviation 2 g. ~~ 
areas A probability that Tom, who has two children’s portions, is given more than his 


father, who has an ordinary portion? 


Solution 8.7 


Cis the quantity, in grams, in a child’s portion. Then C ~ N(43, at ‘ 
Ais the quantity, in grams, in an ordinary portion, Then A ~ N(90, 9) 
You need to find P(C, + C,> A), ie. P(C, + C,-A>0} 
Let W=C,+C,-A 
E(W) = E(C,) + E(C,) - E(A) 
=2E(C) - E(A) 
= 86-90 
=-4 
Var(W) = Var(C,) + Var(C)) + Var(A) 
= 2 Var(C) + Var(A) 
=84+9 
=17 
So W ~ N(-4, 17) 


i 
1 
0-4) 1} 
niw>o)=a{z.> °F) \ 
= P(Z > 0.970) _ ae 
Eee “ 0 0.970 
The probability that Tom is given more than his father is 0.17 (2 s.f.). So aie ee 


. Xand Y are independent normal variables with 


X ~ N(100, 49) and Y~ N(110, 576). 


(a) Find the mean and the standard deviation of 
the distribution X + Y. 

(b) Describe the distribution of X + Y. 

(c) Find P(X + Y> 200). 

(d} Find P(180 < X + ¥< 240). 


. Each weekday Mr Harper goes to the local 


library to read the newspapers. The time he 
spends travelling is a normal variable with mean 
15 minutes and standard deviation 2 minutes. 
The time he spends in the library is normally 
distributed with mean 25 minutes and standard 
deviation 4 minutes. 

Find the probability that, on a particular day, Mr 
Harper 


(a) is away from the house for more than 
45 minutes, 

(b) spends more time travelling than in the 
library. 


. Bolts are manufactured which are to fit in holes 


in steel plates. 

The diameter of the bolts is normally distributed 
with mean 2.60 cm and standard deviation 

0.03 cm. The diameter of the holes is normally 
distributed with mean of 2.71 cm and standard 
deviation of 0.04 cm. 


(a) Verify that, if a bolt and a hole are selected 
at random, the probability that the bolt is 
too large to enter the hole is 0.0139, 

{b) The random selection of a bolt and hole is 
carried out five times. Find the probability 
that in every case the bolt will be able to 
enter the hole. (C) 


. The mass of a particular article follows a normal 


distribution with mean 20 g and variance 4 9”. A 
random sample of 12 items is tested. Find the 
probability that the total mass is less than 230 g. 


. Fiona, Carly, Jenny and Vicky swim in the 


4x 100 m freestyle relay team, with each one 
swimming 100 m. The times in seconds taken by 
each of the girls to swim 100 m are independent 
normal variables, distributed as follows: 


F~ N(52.5, 0.37}, C~ N(S2.0, 0.62), 
J~N(53.5,1.27), V~N(SL.S, 0.62). 
Calculate the probability that in a particular 
race, 


(a) Fiona will swim her leg in less than 

52.5 seconds, 
(b) the relay team will take longer than 

3 minutes 31.3 seconds to swim the race, 
(c}_ Carly will swim her leg faster than Vicky. 


e8a Sums and differences of norr 


6. 


The mass, in grams, of a Chocolate Delight cake 
is normally distributed with mean 20 g and 
standard deviation 2 g. The cakes are sold in 
packets of six and the mass of the packing 
material is normally distributed with a mean of 
30 g and a standard deviation of 4 g. 


(a) Find the probability that the mass of 
six cakes is less than 110 g. 
(b) Find the probability that the total mass of a 
packet containing six cakes is 
(i) more than 162 g, 
(ii) less that 137 g, 
(iil) between 140 g and 153 g. 


. Ina certain village, the heights of the women are 


normally distributed with a mean of 164 cm and 
a standard deviation of 5 cm. The heights of the 
men are normally distributed with a mean of 
173 cm and a standard deviation of 6 cm. 

A man and a woman are picked at random from 
the people in the village. 

Find the probability that 


(a) the woman is taller than the man, 
(b) the man is more than 5 cm taller than the 
woman. 


. The mass of a certain grade of apple is normally 


distributed with mean mass 120 g and standard 
deviation 10 g. 


(a) An apple of this grade is selected at random. 
Find the probability that its mass lies 
between 100.5 g and 124 g. 

{b) Four apples of this grade are selected at 
random. Find the probability that their total 
mass exceeds S05 g, 


Rods are produced in two lengths, called ‘short’ 
and ‘long’. 
Sis the Jength, in centimetres, of a short rod, 
where S ~ N(5, 0.25).- 

Lis the length, in centimetres, of a long rod, 
where L ~ N(10, 1). 
Rods are joined to give longer lengths. Find the 
probability that a length consisting of 


(a) two short rods and four long rods is longer 
than 52 cm, 

(b) three short rods and two long rods is 
between 33 cm and 36 cm long, 

{c) six short rods is longer than a length 
consisting of three long rods, 
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10. Mr Smith has five dogs, two of which are male 
and three are female. The masses of food they eat 
in any given week are normally distributed as 
follows: 


Standard 
Mean (kg) deviation (kg) 
Male 3.5 0:4 
Female 2.5 0.3 


Find the probability that the two males eat more 
than the three females in a particular week. 


14. The time taken to carry out a standard service on 
a car of type A is known, to a good 
approximation, to be a normal variable with 
mean 1 hour and standard deviation 10 minutes. 
Assuming that only one car is serviced at a time, 
find the probability that it will take more than 
6.5 hours to service six cars. 

The time taken to carry out a standard service on 
a car of type B is a normal variable with mean 
1.5 hours and standard deviation 15 minutes. 
Find the probability that five cars of type B can 
be serviced more quickly than eight cars of 


type A. 


12. The process of painting the body-work of a 
mass-produced lorry consists of giving it one 
coat of paint A, three coats of paint B and two 
coats of paint C. A record of the quantity of each 
type of paint used for each coat is kept for each 
lorry produced over a long period. The following 
table gives the means and standard deviations of 
these quantities measured in litres: 


Standard 

Mean deviation 
The coat of paint A 3:7 0.42 
Each coat of paint B 1.3 0.15 
Each coat of paint C 1.0 0.12 


Remember that, for any constant 4, 


13, 


14, 


Assuming independence of the distribution for 
each coat, calculate the mean and standard 
deviation for the total quantity of paint used on 
each lorry. 

Assuming that the quantities of paint used for 
each coat are normally distributed, calculate 


(a) the percentage of lorries receiving less than 
8.5 litres of paint, 

{b) the percentage of lorries receiving more than 
10.0 litres of paint. (Cc) 


The values of two types of resistors are normally 
distributed as follows: 


Type A: mean: 100 ohms; standard deviation: 
2 ohms 

Type B: mean: 50 ohms; standard deviation: 
1.3 ohms 


(a) What tolerances would be permitted for 
type A if only 0.5% were rejected? 

(b) 300-ohm resistors are made by connecting 
together three of the type A resistors, drawn 
from the total production, What percentage 
of the 300-ohm resistors may be expected to 
have resistances greater than 295 ohms? 

(c) Pairs of resistors, one of 100 ohms and one 
of 50 ohms, drawn from the total 
production for types A and B respectively, 
are connected together to make 150-ohm 
resistors. What percentage of the resulting 
resistors may be expected to have resistances 
in the range 150 ohms to 151.4 ohms? 

{AEB) 


‘The time of departure of my train from Temple 
Meads Station is distributed normally about the 
scheduled time of 08:25 with a standard 
deviation of 1 minute. J arrive at Temple Meads 
Station on another train whose time of arrival is 
normally distributed about the scheduled time of 
08:20 with standard deviation of 1 minute. It 
takes me three minutes to change platforms. 
If] miss the train from Temple Meads, I am late 
for work. 
(a) Find the probability that am fate for work. 
{b) Find the probability that I miss the train 
from Temple Meads Station every day from 
Monday to Friday in a given week. 


MULTIPLES OF INDEPENDENT NORMAL VARIABLES 


E(aX) = aE(X) (page 246) and Var(aX) = a* Var(X) (page 250) 


If X is a normal variable such that X ~ N(w, a”) 


then E(aX) =aE(X) = au 
Var(aX) = a Var(X) = a0? 


It can be shown that aX is also normally distributed 


so aX~ N(ap, a? 07) 


Now consider two independent norma varia’ nd Y where X ~ 2 
( ) ‘Pi bles X and e N(ity, of ); 


For any constants a, b, using the results on page 403 
E(aX + bY) = aE(X) + bE(Y) =4 

=a, t+ bp, ——~ Res 
E(aX ~ bY) = aE(X) - bE(Y) = ai, - bass ¢ 
Var(aX + bY) =a? 
Var(aX — bY) =a? 


f 
r(X) + 6? Var(Y) = PoP +bo? 
1(X) 


Va 
Var(X) +b? Var(Y)=ao74+b?o2 «© 


fRemember the + sign 


aX + bY and aX - bY are also normally distributed, so 


aX+bY 
aX- bY ~} 


Hy + bu,,a° of +bo 
ing eS 
(au, - bun, a° of +b? a 


Example 8.8 


X and Y are independent random variables and X ~ N(100, 8), Y ~ N(55, 10). Find th 
, 10). e 


probabilit that an observation from the po: ulation of X is more than twice t Vv: 
y pop he value of an 


Solution 8.8 


You need to find P(X > 2Y), ie. P(X - 
Ee eh ), ie. P(X -2Y>0). 


E(D) = E(X) - 2E(Y) = 100 ~ 110 =-10 
Var(D) = Var(X) + 2? Var(Y) =8 +4 x 10 = 48 


So D ~ N(~10, 48) 


P(D>0) i 3 a \ 
Vas X 


=P(Z> 1.443) 
=1- (1.443) 
=1-0.9255 
ey D: 2 
0.0745 : 10 0 


The probability that an observati 
i tion from the lati i i 
an observation from the population of Y is 0.075 (2 ef). Rbinneiernge ct ae one 


Great care mi be taken in distinguish: etwee: om Vv: les and a multiple 
ust ken sting) b 
ny Pp. 
g ‘na sum of random variab’ 


For example, if X is the weigh sm: w 
> t i 
Piet ight of a small loaf, then the sum X, +X , + X; is the total weight 


If X ~ Niu, 0?) then X, +X, +X ~N(3p, 302). 


ere is a large e omy-size loar w: is three times the wet; of a smal toaf, then 
But if th ge economy-siz loaf which is th: ti th bt i aT th 
the weight of an economy loaf is 3X (a multiple) sei 


and 3X ~ N(3u, 902). 
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In general, for X 


Sum: 
Multiple: 


Notice that the means are the same but the variances are not. 
The distribution for the multiple is more spread out. 


Look carefully at the following example. 


Example 8.9 
A soft drinks manufacturer sells bottles of drinks in two sizes. The amount in each bottle, in 
Mean (ml) Variance (ml?) 
Small 252 4 
Large 1012 ees 


millilitres, is normally distributed as shown in the table: =, 

(a) A bottle of each size is selected at random. Find the probability that the large bottle 
contains less than four times the amount in the small bottle. a 

(b) One-large and four small bottles are selected at random. Find the probability that the 
amount in the large bottle is less than the total amount in the four small bottles. 


Solution 8.9 


Let S$ be the amount, in millilitres, in a small bottle. Then S ~ N(252, 4). 
Let L be the amount, in millilitres, in a large bottle. Then L ~ N(1012, 25). 


(a) To find the probability that the large bottle contains less than four times the amount in a 
small bottle, you need P(L < 4S) 
ie, P(L~ 48 <0). oo 
Now E(L - 48) = E(L) - E(4S) 
= E(L) - 4E(S) 
= 1012 - 1008 
=4 


Var(L — 48) = Var(L) + Var(45) 
= Var(L) + 16 Var(S) 
=25+64 
= 89 


So L-4S ~ N(4, 89) 
T 
0-4 I 
rit 48<0)=7( < ) i» 


V39 
= P(Z <-0.424) 
= 1-(0.424) 
=1- 0.6642 Das, © 
= 0.3358 Zz: -0.424 


The probability that a large bottle contains less than four times the amount in a small 
bottle is 0.34 (2 s.f.). 


(b) To find the probability that the amount in a large bottle is less than the total amount in 
four small bottles you need P(L <S,+$,+583+ S4) = P(L - (S, +S, +83 +S,) <0) 


EL ~ (S +--+ $,)) = E(L) ~E(S, ++ #8) 


= E(L) - 4E(S) 
= 1012 - 1008 


we 


Var(L — (S, +--- + S,)) = Var(L) + Var(S, +- 
= Var(L) + 4 Var(S) 


=254+16 
=41 


Therefore L-(S,+---+5,) ~ N(4, 41) 


P(L—(S, +--+ +59) <0)-242 < a 


41 
= P(Z <-0.625) 
=1-9(0.625) 
= 0.266 


ae 


sum of normal variables 


~ 
+ S4) Remember the + sign 


1 
| 
i 
t 
| 
| 
| 
+ 
L-(S) 40-484) 0 4 
Z: 0.625 0 


The probability that a large bottle contains less than four small bottles is 0.27 (2 s.f.). 


It is very important to distinguish between 
the multiple of S in part (a) and 
the sum of $,, §,, $3, S4 in part (b). 
Note that E(L~4S)=4 
E(L — (S, +S, +83 +8,4)) =4 


Var(L — 48) = 89 
Var(L - (S, +S, +83+S,))=41 


The méans are the same. 


The variances are different. 


Exercise 8b 


1, X and Y are independent normal variables such 
that X ~ N(40, 12) and Y ~ N(60, 15). Find 


(a) P(2X + ¥> 130} 
(b) P(3X-2Y<20) 


2. The time taken by Simon to do his Mathematics 
homework can be modelled by a normal 
distribution with mean 50 minutes and standard 
deviation 10 minutes. The time taken by Belinda 
is N(30, 25). 


(a) Find the probability that, for a particular 
homework, Simon takes more than twice as 
long as Belinda. 

{b) Find the probability that Belinda spends less 
time in total on Monday’s homework and 
Thursday’s homework than Simon spends 
on Monday’s homework. 


Multiples of normal variab 


eS 


3. The thickness, P cm, of a randomly chosen 
paperback book may be regarded as an 
observation from a normal distribution with 
mean 2.0 and variance 0.730. 
The thickness, H cm, of a randomly chosen 
hardback book may be regarded as an 
observation from a normal distribution with 
mean 4.9 and variance 1.920. 


{a) Determine the probability that the combined 
thickness of four randomly chosen 
paperbacks is greater than the combined 
thickness of two randomly chosen 
hardbacks. 
(b) By considering X = 2P — H, or otherwise, 
letermine the probability that a randomly 
chosen paperback is less than half as thick 
as a randomly chosen hardback. 

{c) Determine the probability that a randomly 
chosen collection of 16 paperbacks and 8 
hardbacks will have a combined thickness of 
less than 70 cm. 


(Give three decimal places in your answers.) (C) 


TE 
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4. The random variable X is distributed normally 6. Next May, an ornithologist intends to trap one 


with mean yz and variance 6, and the random 
variable Y is normally distributed with mean 8 
and variance 0°. 

2X — 3Y is distributed normally with mean -12 
and variance 42, Find 


(a) the value of gz and the value of o, 
(b) P(X > 8), 

(c) P(Y<9), 

id} P(-4<3X-2Y<7). 


5, A single observation is taken from each of the 
distributions 
A ~ N(82, 1.52), B~ N(42, 0.37) and 
C~ N(85, 0.72) 
Find the probability that the mean of these 
observations, }(A + B + C), is greater than 70. 


Summary 


_ @ For two independent normal variables such that 


X- Nusa?) and ¥~ Nin, 07) 
KEY - NW +o, o¢ +07) 
KX -Y- Nuys or ton) 


mafe cuckoo and one female cuckoo. The mass i Miscellaneous worked exam ples 


M of the male cuckoo may be regarded as being 
a normal random variable with mean 116 g and 
standard deviation 16 g. The mass F of the 
female cuckoo may be regarded as being 
independent of M and as being a normal random 
variable with mean 106 g and standard deviation 
12 g. Determine 


(a) the probability that the mass of the two 
birds together will be more than 230 g, 

(b) the probability that the mass of the male 
will be more than the mass of the female. 


By considering X = 9M - 16F, or otherwise, 
determine the probability that the mass of the 
female will be less than nine-sixteenths of that of 
the male. 

Suppose that one of the two trapped birds 


Example 8.10 


The distribution of the masses of adult husky d. 
he distr > y dogs may b 
distribution with mean 37 kg and standard dae $ ie ee eee 


(a) Calculate the probability that an adult husky h 
ili y has a mass greater than 30 kg. 
(b) Calculate the probability that a randomly chosen team of six sasbies has ee mass 


(NEAB} 


lying between 198 kg and 240 kg, giving your answer to three decimal places 


Solution 8.10 
His the mass, in kilograms, of a husky dog. Then H ~ N(37, 5”). 


escapes. Assuming that the remaining bird will 30 - 37 
be equally likely to be the male or the female, (a) P(H > 30)=P{Z > 
determine the probability that its mass will be 5 
more than 118 g. (C) = P(Z>~1.4) 
= (1.4) ” 
H: 
= 0.9192 Z: ai ° 


‘Bor n independent normal variables such that X, ~ N(jts 7) 


Xt Xt 4 XN to to te op top+--+0,) 

e For 7 independent observations of the random variable X where X ~ N(w, 07), 
X, +X) +--+ X,~-NQw, no’) 

e For the normal variable such that X ~ N(u, 07), and for any constant a 
aX ~ N(au, a707) 

_@ For two independent normal variables such that 
Xe NGji,,02) and Y ~ N(4;,07°) and for any constants a and b 

aX + bY ~ Nlau, + by, op +6707) 
aX — bY ~ Niau, - bu), 4702 +6707) 


The probability that an adult husky dog has a mass greater than 30 kg is 0.919 (3 d.p.). 
(b) Let T= H, +H, +--+ +H, 

E(T) = 6E(H) = 6 x 37 = 222 and Var(T) = 6 Var(H) =6 = 

*. T~ N(222, 150) ices oars 


198 — 222 240 - 222 


< 
Visa <7 *~ Vaso 
= P(-1.960 < Z < 1.470) 
= (1.960) + @(1.470) —1 
= 0.9750 + 0.9292 ~-1 


P(198 < T < 240) = PI 


r 198 222 240 
= 0.9042 z: -196 0 LAa7 


The probability that six huski i i 
aoe a ae at six huskies have a total mass lying between. 198 g and 240 kg is 
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Example 8.11 
The lifetimes of Econ light bulbs are normally distributed with mean 1000 h and standard 
deviation 25 h. 
(a) Find, to three decimal places, the probability that an Econ light bulb will have a lifetime 


between 975 h and 1020 h. ; ane riick 
(b) Calculate, to three decimal places, the probability that the sum of the lifetimes of eight 


Econ light bulbs will exceed 7930 h. Indicate clearly the stage in your calculation when an 
assumption concerning independence is essential. 
The lifetimes of Enersaver light bulbs are normally distributed with mean 7900 h and 
standard deviation 50 h. 


places, the probability that an Enersaver light bulb will last at 
(NEAB) 


(c) Calculate, to three decima i 
least eight times as long as an Econ light bulb. 


Solution 8.11 
X is the lifetime, in hours, of an Econ light bulb. Then X ~ N(1000, 25%). 


(a) P(975 <X < 1020) 
975-1000 _, _ 1020~ son 


. 25 25 
=P(-1<Z<0.8) 
= O(1) + (0.8) -1 
= 0.8413 + 0.7881 -1 x: 975 1000 1020 
= 0.6294 z: + 0 O08 
The probability that an Econ light bulb has a lifetime between 975 h and 1020 h is 0.629 


(3 d.p.). 
(b) Sis the sum of the lifetimes of eight Econ light bulbs, so $= X, + X t+ Xy 


E(S) = 8E(X) = 8000 : oF 
Var(S) = 8 Var(X) =8 x 25%= 5000 (assuming the lifetimes are independent) 


S ~ N(8000, 5000) 


fl 
7930) = PIZ 7930 — 8000 i 
PIS = pe nsec ees 
ae {5000 a 
= P(Z > -0.990) i: 
= (0.2.20) S: 7930 8000 
= 0.8389 z: -0.990 0 
‘The probability that the sum of the lifetimes of eight Econ light bulbs exceeds 7930 h is 
0.839 (3 d.p.). 
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(c) Yis the lifetime of an Enersaver light bulb and Y ~ N(7900, 507), 


P(Y > 8X) is needed, i.e. P(Y~ 8X > 0). 

E(Y - 8X) = E(Y) - 8E(X) = 7900 — 8000 = —100 

Var(Y — 8X) = Var(Y) + 8? Var(X) = 507 + 64 x 257 = 42 500 (assuming independence). 
Y — 8X ~ N(-100, 42 500) 


0 -(-100) I 
PIY- 20)=P}Z > ————_ 
meee! [2 V42 500 
= P(Z> 0.485) 
=1-9(0.485) | 
=1-0.6862 Y- 8X: “Ibo 0 
= 0.3138 Zz; 0 0.485 


The probability that an Enersaver light bulb lasts at least eight times as long as an Econ 
light bulb is 0.314 (3 d.p.). 


Miscellaneous exercise 8c 


1, The weights of grade A oranges are normally 3. In testing the length of life of electric light bulbs 
distributed with mean 200 g and standard of a particular type, it is found that 12.3% of the 
deviation 12 g. Determine, correct to two bulbs tested fail within 800 hours and that 
significant figures, the probability that 28.1% are still operating 1100 hours after the 

start of the test. 

Assuming that the distribution of the length of 

life is normal, calculate, to the nearest hour in 


(a) a grade A orange weighs more than 190 g 
but less than 210 g, 
{b) a sample of 4 grade A oranges weighs more 


than 820 g. each case, the mean, 2, and the standard 
. deviation, o, of the distribution. 

The weights of grade B oranges are normally A light fitting takes a single bulb of this type. A 
distributed with mean 175 g and standard packet of three bulbs is bought, to be used one 
deviation 9 g. Determine, correct to two after the other in this fitting. State the mean and 
significant figures, the probability that variance of the total life of the 3 bulbs in the 
(c)_ a grade B orange weighs less than a grade A packet in terms of and o and calculate, to two 

orange, decimal places, the probability that the total life 
{d) a sample of 8 grade B oranges weighs more is more than 3300 hours. 


Calculate the probability that all 3 bulbs have 


than a sample of seven grade A oranges. (C) walcu ] 
lives in excess of 1100 hours, so that again the 


2. Prints from two types of film C and D have total life is more than 3300 hours. Explain why 
developing times which can be modelled by this answer should be different from the previous 
normal variables, C with mean 16.18 s and one: (NEAB) 
standard deviation 0.11 s and D with mean ; 4 
15.88 s and standard deviation 0.10 s, 4. The weight of a large loaf of bread is a normal 


variable with mean 420 g and standard deviation 
30 g. The weight of a small loaf of bread is a 
normal variable with mean 220 g and standard 
deviation 10 g. 


{a) What is the probability that a type C print 
will take less than 16 s to develop? 

(b) A type C print is developed and immediately 
afterwards a type D print is developed. 
What is the probability that the total time is 
greater than 32.5 s? 


{a) Find the probability that 5 large loaves 
weigh more than 10 small loaves. 
(c). What is the probability of a type C print {b) Find the probability that the total weight of 


taking t to develop th: type D S large loaves and 10 small loaves lies 
aint ea ae a ak between 4.25 kg and 4.4 kg. (C) 
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5. The tensile strengths, measured in newtons (N), 
of a large number of ropes of equal length are 
independently and normally distributed such that 
5% are under 706 N and 5% over 1294 N. 

Four such ropes are randomly selected and 

joined end-to-end to form a single rope; the 
strength of the combined rope is equal to the 
strength of the weakest of the 4 selected ropes. 
Derive the probabilities that this combined rope 
will not break under tensions of 1000 N and 

900 N, respectively. 

A further 4 ropes are randomly selected and 
attached between two rings, the strength of the 
arrangement being the sum of the strengths of 
the 4 separate ropes. Derive the probabilities that 
this arrangement will break under tensions of 
4000 N and 4200 N, respectively. (NEAB) 


6. X and Y are independent normally distributed 
random variables such that X has mean 32 and 
variance 25, and Y has mean 43 and variance 96. 
Find 
(a) P(X > 43), 

(b) P(X-Y>0), 
{c) PQX-Y>0). (NEAB) 

7. The times taken by two runners A and B to run 
400 m races are independent and normally 
distributed with means 45.0 s and 45.2 s, and 
standard deviations 0.5 s and 0.8 s respectively. 
‘The two runners are to complete in a 400 m race 
for which there is a track record of 44.5 s. 


(a) Calculate, to three decimal places, the 
probability of runner A breaking the track 
record. 

(b) Show that the probability of runner B 
breaking the track record is greater than 
that of runner A. 

(c) Calculate, to three decimal places, the 


probability of runner A beating runner B. 
(NEAB) 


8. Ina packaging factory, the empty containers for 
acertain product have a mean weight of 400 g 
with a standard deviation of 10 g. The mean 
weight of the contents of a full container is 800 g 
with a standard deviation of 15 g. Find the 
expected total weight of 10 full containers and 
the standard deviation of this weight, assuming 
that the weights of containers and contents are 
independent. 

Assuming further that these weights are normally 
distributed random variables, find the proportion 
of batches of 10 full containers which weigh 

more than 12.1 kg. (O@C) 


9, Jam is packed into tins of advertised weight 1 kg. 


10 


The weight of a randomly selected tin of jam is 
normally distributed about a target weight with a 
standard deviation of 12 g. 


{a) Ifthe target weight is 1 kg, find the 


probability that a randomly chosen tin 
weighs 

(i} less than 985 g, 

(ii) between 970 g and 1015 g. 


(b) If not more than one tin in 100 is to weigh 


less than the advertised weight, find the 
minimum target weight required to meet this 
condition. 


(c) The target weight is fixed at 1 kg. The 


resulting tins are packed in boxes of six and 
the weight of the box is normally distributed 
with mean weight 250 g and standard 
deviation 10 g. Find the probability that a 
randomly chosen box of 6 tins will weigh 
less than 6.2 kg. {L) 


{a} The lifetime in hours of an electrical 


component has a normal distribution with 

mean 150 hours and standard deviation 

8 hours. 

Find the probability that 

(i) anew component lasts at least 160 
hours, 

(ii) a component which has already 
operated for 145 hours will last at 
least another 15 hours. 

(b) The weight of these components is normally 
distributed with mean 250 g and standard 
deviation 10 g. Each component is in its 
own box, the weight of which is also 
normally distributed with mean 50 g and 
standard deviation 5 g. There are 10 boxed 
components to a carton and the weight of 
the carton is normally distributed with mean 
75 g and standard deviation 7 g. 

Find the probability that a carton of 10 
boxed components weighs less than 3 kg. (L} 


11. Jim Longlegs is an athlete whose specialist event 


js the triple jump. This is made up of a bop, a 
step and a jump. Over a season the lengths of the 
hop, step and jump sections, denoted by H, 5 
and J respectively, are measured, from which the 
following models are proposed: 


H-~N(5.5, 0.52), S~ N(5.1, 0.62), J ~ N(6.2, 0.8") 


where all distances are in metres. Assume that H, 
S and J are independent. 


(a) In what proportion of his triple jumps wil 
Jim’s total distance exceed 18 m? 

(b) In 6 successive independent attempts, W at 
is the probability that at least one total 
distance will exceed 18 m? 

(c) What total distance will Jim exceed 95% of 
the time? netes ‘ole 

(d) Find the probability that, in Jim’s next ip 


jump, his step will be greater than his wueN) 


12. [In this question give three places of decimals in 


each answer.] 

The mass of tea in ‘Supacuppa’ teabags has a 
normal distribution with mean 4.1 g and 
standard deviation 0.12 g. The mass of tea in 
“Bumpacuppa’ teabags has a normal distribution 
with mean 5.2 g and standard deviation 0.15 g. 


(a) Find the probability that a randomly chosen 
Supacuppa teabag contains more than 4.0 g¢ 
of tea. 

(b) Find the probability that, of 2 randomly 
chosen Supacuppa teabags, one contains 
more than 4.0 g of tea and one contains less 
than 4.0 g of tea. 

{c) Find the probability that 5 randomly chosen 
Supacuppa teabags contain a total of more 
than 20.8 g of tea. 

(d) Find the probability that the total mass of 
tea in 5 randomly chosen Supacuppa 
teabags is more than the total mass of tea in 
4 randomly chosen Bumpacuppa teabags. 

(C) 


Mixed test 8A 


13. A small bank has two cashiers dealing with 


customers wanting to withdraw or deposit cash. 
For each cashier, the time taken to deal with a 
customer is a random variable having a normal 
distribution with mean 150 s and standard 
deviation 45 s. 


(a) Find the probability that the time taken for 
a randomly chosen customer to be dealt 
with by a cashier is more than 180 s. 

(b) One of the cashicrs deals with two 
customers, one straight after the other. 
Assuming that the times for the customers 
are independent of each other, find the 
probability that the total time taken by the 
cashier is less than 200 s. 

{c) Ata certain time, one cashier has a queuc of 
4 customers and the other cashier has a 
queue of 3 customers, and the cashiers begin 
to deal with the customers at the front of 
their queues. Assuming that the cashiers 
work independently, find the probability 
that the 4 customers in the first queue will 
all be dealt with before the 3 customers in 


the second queue are all dealt with. (C) 


1, 


A country baker makes biscuits whose masses 
are normally distributed with mean 30 g and 
standard deviation 2.3 g, She packs them by 
hand into either a small carton (containing 20 
biscuits) or a large carton (containing 30 
biscuits). 


(a) State the distribution of the total mass, S, of 


biscuits in a small carton and find the 
probability that $ is greater than 615 g. 

(b) Six small and four large cartons are placed 
in a box. Find the probability that the total 
mass of biscuits in the 10 cartons lies 
between 7150 g and 7250 g. 

{c}) Find the probability that 3 small cartons 
contain at least 25 g more than 2 large ones. 


The label on a large carton of biscuits reads ‘Net 
mass’ 900 g’. A trading standards officer insists 

that 90% of such cartons should contain biscuits 
with a total mass of at least 900 g. 


(d) Assuming the standard deviation remains 
unchanged, find the least value of the mean 
mass of a biscuit consistent with this 


requirement. (MEI) 


Foster’s Fancy Cakes are sold in packets of six. 
The mass of each cake is a normally distributed 
random variable having mean 25 g and standard 
deviation 0.4 g. The mass of the packaging is a 
normally distributed random variable having 
mean 20 g and standard deviation 1 g. Find, to 
three decimal places, the probabilities that 


{a} the mass of a randomly chosen cake is 
between 24.7 g and 25.7 g, 

(b) the total mass of a randomly chosen packet 
is less than 173 g. 


State one assumption that you have made in 
answering (b). 


. Monto sherry is sold in bottles of two sizes: 


standard and large. For cach size, the content, in 
litres, of a randomly chosen bottle is normally 
distributed with mean and standard deviation as 
given in the table, 


Standard 

Mean deviation 
Standard bottle 0:760 0.008: 
Large bottle 4.010 0.009 


{a) Show that the probability that a randomly 
chosen standard bottle contains less than 
0.750 litres is 0.1056, correct to four places 
of decimals. 

(b) Find the probability that a box of 10 
randomly chosen standard bottles contains 
at least 3 bottles whose contents are each 
less than 0.750 litres. Give three significant 
figures in your answer. 

{c) Find the probability that there is more 
sherry in 4 randomly chosen standard 
bottles than in 3 randomly chosen large 
bottles. (C) 


(NEAB) 
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Mixed test 8B 


1. The continuous random variables X and Y 
represent the masses of male and female students 
who attend my local College. 

Both X and Y are normally distributed such that 
X ~ N(75, 6?) and Y~ N(65, 57), where all 
masses are given in kilograms. 


(a) Find the probability that, if a male student 
and a female student are chosen at random, 
they both have a mass exceeding 70 kg. 

(b) State carefully the distribution of the 
combined mass of a random sample of 
m male and f female students. 

A lift in the college has a notice 


MAXIMUM 8 PEOPLE or 650 kg 


Find the probability that the combined mass 
of a random sample of 8 students will 
exceed the mass restriction if it consists of 
(i) 8 males, 

(ii) 5 males and 3 females. 

(c) What is the probability that a randomly 
selected female student has a greater mass 
than a randomly selected male student? 

(MEI) 


2. The mass of a cheese biscuit has a normal 
distribution with mean 6 g and standard 
deviation 0.2 g. Determine the probability that 


(a) a collection of twenty-five cheese biscuits 
has a mass of more than 149 g, 

(b) acollection of 30 cheese biscuits has a mass 
of less than 180 g, 

(c) twenty-five times the mass of a cheese 
biscuit is less than 149 g. 


. Certain components for a revolutionary new 


The mass of a ginger biscuit has a normal 
distribution with mean 10 g and standard 
deviation 0.3 g. Determine the probability that a 
collection of 7 cheese biscuits has a mass greater 
than a collection of 4 ginger biscuits. 

(It may be assumed that all the biscuits were 
sampled at random from their respective 
populations.} (C) 


sewing machine are assembled by inserting a part 
of one type (sprotsil) into a part of another type 
(weavil). Sprotsils have external dimensions 
which are normally distributed with mean 

2,50 cm and standard deviation 0.018 cm. 
Weavils have internal dimensions which are 
normally distributed with mean 2.54 cm and 
standard deviation 0.024 cm. Under suitable 
pressure, the two types fit together satisfactorily 
if the dimensions differ by not more than 
£0,035 cm. Show that, if pairs of parts are 
chosen at random, the difference 


D = internal dimension of a weavil 
- external dimension of a sprotsil 


is distributed with mean 0.04 cm and standard 
deviation 0.030 cm. Hence show that 
approximately 42.8% of randomly selected pairs 
will fit together satisfactorily. Now, if it is 
known that the internal dimension of a given 
weavil is 2.517 cm, what is the probability that a 
randomly chosen sprotsil will fit this weavil 
satisfactorily? (AEB) 


Sampling and estimation 


In this chapter you will learn about 


sampling methods including random and non-random sampling 


® 


e how to simulate a random sample from a given distribution 
e the expectation and variance of the sample mean 

e the distribution of the sample mean 

e the use of the central limit theorem 

e the distribution of the sample proportion 


e estimates of population parameters: 
— mean 
— variance 
— proportion 


e confidence intervals for: 
— a population mean, involving the z-distribution 
— a population mean, involving the é-distribution 
- a population proportion 


SAMPLING 


Population 


In a statistical enquiry you often need information about a particular group. This group is 
known as the population or the target population, and it could be small, large or even infinite. 
Note that the word ‘population’ does not necessarily mean ‘people’. 

Here are some examples of populations: 


- pupils in a class, 

- people in England in full time employment, 
~ hospitals in Wales, 

~ cans of soft drink produced in a factory, 

~ ferns in a wood, 

rational numbers between 0 and 10. 


SURVEYS 


Information is collected by means of a survey. There are two types: 


(a) a census, 
(b) a sample survey. 


(a) Census 


In a census every member of the population is surveyed. 


When the population is small, this could be a straightforward exercise. For example, it would 
be easy to find out how each pupil in a class travelled to school on a particular morning. 
When populations are large, taking a census can be very time consuming and difficult to do 
with accuracy. Each year the government carties out a census in schools on the third Thursday 
in January. This requests the number of boys and girls in each age group on the roll of every 
school in the country. Its accuracy, though, relates only to that day. Even more difficult to 
carry out accurately is the population census taken every ten years. This attempts to provide 
details of different age groups for every area in Britain. When populations are very large, or 
infinite, it is not possible to survey every member. 


On occasions it would not be sensible to survey every member. For example, if you performed 
a census to establish the length of life of a particular brand of light bulb, you would test each 
bulb until it failed and so you would destroy the population! 


(b) Sampie survey 


When a survey covers less than 100% of the population, it is known as a sample survey. In 
many circumstances, taking a sample is preferable to carrying out a census. Sample data cap 
be obtained relatively cheaply and quickly and, if the sample is representative of the 
population, a sample survey can give an accurate indication of the population characteristic 


being studied. 


The size of the sample does not depend on the size of the population. It often depends on the 
time and money available to collect information. Note that large samples are more likely to 
give more reliable information than small ones. The next time that you read the results of a 
public opinion poll in the newspaper, look at the size of the sample ~ it is usually over 1000. 


Sample design 


Once the purpose of a survey has been stated precisely, the target population must be defined, 
for example 
— all the primary schools in England, 


— all the oak trees in Hampshire, 
— all the people admitted to the General Hospital in January suffering from a heart attack. 


The sampling units t bi i i 
beac ig must be defined clearly. These are the people or items to be sampled, for 


— the primary school, 
— the oak tree, 
— the person suffering from a heart attack. 


ee the sampling units within a population are individually named or numbered to form a 
ist, then this list of sampling units is called a sampling frame. It could take various forms 
(e.g. a list, a map, a set of maps), and should be as accurate as possible. 


Ideally the sampling frame should be the same as the target population. For example, if th 
target population is all the first year students in a college, then the sampling Fibs and ce 
target population should be the same, provided that the register is up-to-date and accurate. A 
sampling frame for people in Britain eligible to vote, however, is more difficult to form. The 
electoral register attempts to list all those who are eligible to vote throughout all the pean in 
the country, but it is never completely accurate, since many changes occur during the time that 


the information is being processed. Some 
: people do not return the forms, people move i 
out of the area, people die etc. is ae 


a instances it is not possible to enumerate all the population, for example, the fish in a 
ake. 


Example 9.1 


(a) Explain briefly what you understand by 
(i) a population, 
(ii) a sampling frame. 


(b) A market research organisation wants to take a sample of 
(i) owners of diesel motor cars in the UK, 
(ii) persons living in Oxford who suffered from injuries to the back during July 1996. 


Suggest a suitable sampling frame in each case. (L) 


Solution 9.1 


(a} (i) A population is a particular group of individuals or items. 
(ii) Once the individual members of a population have been numbered to form a list, this 
list is called a sampling frame. ; 


(b) (i) The list of registered owners as kept by DVLA in Swansea. 
(ii) A list made from information supplied by Health Clinics in Oxford during July 1996. 


Bias 


The purpose of sampling is to gain information about the whole population by selecting a 
sample from that population. You want the sample to be representative of the population so 
you must give every member of the population an equal chance of being included in the 
sample. This should eliminate any bias in the selection of the sample. 


Sources of bias include 


(a) the lack of a good sampling frame: 

— using the telephone directory misses a 
number is ex-directory, 

— using the electoral register in a city area misses the more mobile section of the population. 

(b) the wrong choice of sampling unit: 
— choosing an individual rather than a particular group such as ‘household’. 
(c) non-response by some of the chosen units: 

— it might be difficult to locate the particular unit, 

— the cooperation of the respondent might not have been obtained, 

— the enquiry might not have been understood, for example, a questionnaire might have 
been badly designed. Questionnaires should be clear, specific, unambiguous and easily 
understood. Questions should be worded neutrally, especially in opinion surveys, to 
avoid bias caused by pointing towards a particular response. 

(d) bias introduced by the person conducting the survey: 
— the interviewer might not question someone who appears uncooperative, 
— the style of questioning may influence the response. 


ll those who do not have a telephone or whose 


It should be noted that a sample can only be representative of the population from which it is 
selected. If you select a sample of teachers from one school, the sample is representative of the 
teachers in that school, not of all teachers in all schools. 


SAMPLING METHODS 


Once a sampling frame has been established, you can choose a method of sampling. These fall 
into two categories: 


e random sampling e.g. simple, systematic, stratified; 
@ non-random sampling e.g. quota, cluster 


Simple random sampling 


Suppose a population consists of N sampling units and you require a sample of of these units. 
A sample of size n is called a simple random sample if all possible samples of size n are equally 
likely to be selected. Some form of random processes must be used to make the selection. 


If the unit selected at each draw is replaced into the population before the next draw, then it 
can appear more than once in the sample. This is known as sampling with replacement. 


If the unit selected at each draw is not replaced into the population before the next draw, this 
is known as sampling without replacement. 


The second method of sampling without replacement is known as simple random sampling. 
Two methods of simple random sampling are commonly used 


e drawing lots, 
e random number sampling. 


For each, make a list of all N members of the population and give each member a 


number. 


different 


(b) 


Drawing lots 


For each member, place a coloured ball i i 
all into a container and the 
. 4 e d 
oe at random and without replacement. If you wanted a enpest ae ceed Id 
raw os 20 balls. This is suitable for a small population. Note, however, that ae ae 
must be large enough to provide sufficiently accurate information about the paneenae : 


The sample should be selected at random. Any hint of possible bias should be avoided 


If i ae is large then the method of drawing lots, sometimes described as ‘drawi 
out of a hat’ is not practical. You could instead make the choice by referring to rand ie 
number tables, For your reference, a set is printed on page 653. sas 


Using random number tables 


eee number tables consist of lists of digits 0, 1, 2, 3, ..., 9, such that each digit has an 
equal chance of occurring, so for example, the probability that a 3 occurs is 0.1. In random 


number tables the digits may appear sing! i is i 
pen na aa y app gly or be grouped in some way. This is solely for 


Example 9.2 


Here is an extract from a set of random number tables 


6872538159 
2534705495 
3268744705 


Use it to select a random sample of 


(a) eight people from a group of 100 people, 
(b) eight people from a group of 60. 


Solution 9.2 


(a) HY select mcdg of eight people from a target population of 100 people, allocate a two- 
igit oe er to each person, for example allocate 01 to the first on the list, 02 to the 
second, ... up to 98, 99, 00, calling the hundredth person 00 for convenience. 


Using the list, starting at the beginni i 
ginning of the first row and reading along th 
would select people corresponding to the following numbers: Pre, tee 


68 72 53 81 59 25 34 70 


ere you could decide to read the digits backwards, from bottom right, in which 
‘ase your sample would consist of people corresponding to the numbers : 


50 74 47 86 23 59 45 07 


To select a group of eight : 
antabes ish vee ng t from a target population of 60 people, allocate each person a 


Using the tables, disregard any two-digit number outside the range. 


Starting at the beginning of the first row and grouping in pairs gives 
68 JE 53 Bt 59 2S 34 20 S54 95 32 68 74 47 05 


So you would choose people corresponding to the numbers 


53, 59, 25, 34, 54, 32, 47, 05. 


Example 9.3 


£12 
Use the following extract from random number tables to select a pure aes °. 
numbers, each to two decimal places, from the continuous range 0 < x é 
> 


52 74 54 80 68 72 51 96 08 - 
02 52, 09 93 60 43 a7 42 13 


Solution 9.3 


i i f three 
Since the sample values are required to two decimal place accuracy, consider groups 0 
digits, inserting the decimal point between the first and second digit. 


In this case your sample would consist of the values , 
5.27, 4.54, 8.06, 8.72, 5.19, 6.08, 0.00, 2.52, 0.99, 3.60, 4.35, 7.4 ; 


Example 9.4 
Here is a set of random numbers 
848051 386103 153842 242330 580007 479971 


i the 
Use it to select a random sample of four numbers, each to three decimal places, from 
continuous range 0< x <5. 


Solution 9.4 7 
; ‘ : é 
Consider groups of four digits, inserting the decimal point between the first and second digi 
Disregard any values that are out of range. This gives 


8480 5438 6403 1.538 4.224 2.330 5.800 0.747 
So the numbers chosen are 1.538, 4.224, 2.330, 0.747. 


Calculator random number generator 


You probably have a random number generator key [Ran# on oe Rae ner - 
i it. The number 
a number, for example 0.398, every time you press I ee 
aes using a mathematical formula and are really pseudo random numbers, bu 


suit the purpose very well indeed. 


ein 


i s between 1 
Suppose you want to use your calculator to select a random sample of six number: 


and 49 for your entry in the National Lottery. 


To do this, you probably need to press |Shift] then [Ran# | [=]. 


Suppose the numbers you get are 


0.730, 0.798, 0.369, 0.499, 0.491, 0.310, 0.135, 0.112, 0.593, 0.652, 0.015, 0.346 


You can interpret them in various ways, for example: 


e Ifyou decide to use the first two digits to the right of the decimal point each time, you 
would obtain the numbers 73, 79, 36, 49, 49, 31, 13, 11, 59, 65, 01, 34. 

Ignoring repeats and numbers bigger than 49, the six numbers would be 
36, 49, 31, 13, 11, 1. 

@ Suppose instead you decide to choose the second and third digits to the right of the decimal 
point and ignore repeats and numbers bigger than 49. In this case your numbers would be 
30, 10, 35, 12, 15, 46, 

@ Ifyou decide to use all the digits after the decimal point, you would be choosing from the 
digits 730798369499491310135112593652015346, Grouping these as two-digit numbers 
gives 73, 07, 98, 36, 94, 99, 49, 13, 10, 13; Sf, 12, 59, 36, 52, 01, 53, 46. 

Ignoring repeats and numbers bigger than 49 gives the six numbers as 
7, 36, 49, 13, 10, 12. 


The lists are endless! 


Systematic sampling 


Random sampling from a very large population is very cumbersome. 

An alternative procedure is to list the population in some order, for example alphabetically or 
in order of completion on a production line, and then choose every kth member from the list 
after obtaining a random starting point. If you choose every tenth member from the list, for 
example every tenth vehicle passing a checkpoint, you would form a 10% sample. If you 


choose every twentieth item, for example every twentieth card in an index file, you would 
form a 5% sample. 


Example 9.5 


Describe how to choose a systematic sample of eight members from a list of 300. 
Solution 9.5 


Since you are going to choose every &th member, you need to find a suitable value for k. To 


do this, choose a convenient value close to —. 
n 


300 


N 
In this case, —= = 37.5, so k = 40 will do. 
n 


Now choose a random starting point, for example if |Ran#] on your calculator gives 0.870 
take the first member of the sample as 87 and then add 40 each time . The other members are 


127, 167, 207, 247, 287, 27 and 67. Note that when you reach the end of the list, go back to 
the beginning. 


So the sample consists of 27, 67, 87, 127, 167 207, 247, 287. 
eens 


The advantages of systematic sampling are that it is quick to carry out and it is easy to check 


for errors. For large scale sampling, systematic selection is usually used in preference to taking 
simple random samples. 
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The disadvantage of this system is that there may be a periodic cycle within the frame itself. Non-random sampling 


For example a machine may operate in such a manner that every tenth item is faulty. sig 
Systematic sampling of every fifth item, starting at 5 Shia fue in etd eae ie = 

i i Id produce a sample with no x 
sample being faulty, whereas starting at 2 would p ; 
pare if the peiodie cycle is recognised then different samples could be taken by varying the 
starting points and the length of the interval between the chosen items. 


Stratified sampling 


Stratified sampling is used when the population is split into distinguishable anaes 
that are quite different from each other and which together cover the whole population, for 


example 

— age groups, 

~ occupational groups, 

- topographical regions. 

Separate random samples are then taken from each stratum and put together to form the 
sample from the population. . 

It is usual to represent the population proportionately in the strata, as in the following 
example. 


Example 9.6 


Competent Carriers employs 320 drivers, 80 administrative staff and 40 penises 
committee to represent all the employees is to be formed. The committee is to have ine 
members and the selection is to be made so that there is as close a ee as possi 
without bias towards any individuals or groups. Explain how this could be done. 


Solution 9.6 


If you were to take a simple random sample of all 440 employees this would mean re vical 
employee would have an equal chance of being selected. There is a high ee i: the 
committee would consist of 11 drivers and therefore would not be representative of a 


employees. 7 
i n 
A stratified random sample would provide a more accurate representation of the populatio 
and could be formed as follows: 
: 320 
Taking into account that drivers make up 749 of the work force, 
number of drivers = 33g x 11 = 8 
Similarly 
number of administrative staff = # x 11 =2 
number of mechanics = #5 x 11=1 


i is ei i inistrative 
The required representation on the committee is eight drivers, two from the anise ay 
staff and one mechanic. The people to be included can then be selected from each s' 


using simple random sampling or systematic sampling. 


(a) Cluster sampling 


Sometimes there is a natural sub-grouping of the population and these subgroups are called 
clusters. For example, in a population consisting of all children in the country attending state 
primary schools, the local education authorities form natural clusters. When a sample survey 
is carried out on a population that can be broken into clusters it is often more convenient to 
first choose a random sample of clusters and then to sample within each cluster chosen. 


Unlike stratified sampling where the strata are as different from each other as possible, each 
cluster should be as similar to other clusters as possible. 


One advantage of cluster sampling is that there is no need to have a complete sampling frame 
of the whole population. For the primary school children, you would need only a list of the 
pupils in the chosen local authority. Another advantage is that it is usually far less costly than 
random sampling. Consider the fees and travelling expenses paid to interviewers. Far less 
travelling and time is involved in an interviewer visiting individuals in a cluster than visiting 
individuals in the whole population. 


The disadvantage of cluster sampling is that it is non-random. Suppose that a town has 7500 
primary school children in 250 classes, each with an average class size of 30. If you want to 
select a sample of 90 children then you could use simple random sampling. It would however 
be quicker to use the classes as clusters and to take a sample consisting of three classes. This 
would give a. sample of 90 children. The problem is that within each class there will be a 
certain amount of similarity between the children in say age, ability, home background. In 
selecting one whole class or cluster you are in fact selecting 30 similar children instead of 30 
randomly chosen children from throughout the town. Therefore three clusters will not give as 


precise a picture of the whole population as 90 children chosen at random from throughout 
the town. 


(b) Quota sampling 

Quota sampling is widely used in market research where the population is divided into groups 
in terms of age, sex, income level and so on. Then the interviewer is told how many people to 
interview within each specified group, but is given no specific instructions about how to locate 
them and fulfil the quota. This is the method generally used in street interview surveys 
commonly carried out in shopping centres. It is quick to use, complications are kept toa 
minimum and, unlike random sampling, any member of the sample may be replaced by 
another member with the same characteristics. 


If no sampling frame exists, then quota sampling may be the only practical method of 
obtaining a sample. The disadvantage of quota sampling, however, is that it is non-random. 
There is a possibility of bias in the selection process if, for example, the interviewer selects 
those easiest to question or those who look cooperative. The location of such surveys in 
shopping centres excludes a substantial part of the population in that area. It is difficult to 
find out about those who refuse to cooperate and they are simply replaced. One of the reasons 
put forward to explain the inaccuracy of the opinion polls before the British general election 
in 1992 was the high refusal rates of Conservative voters to take part in surveys. 


Exercise 9a Se 


npling methods 


1, Explain briefly the difference between a census 
and a sample survey. 
Give an example to illustrate the practical use of 
each method. 
A school held an evening disco which was 
attended by 500 pupils. The disco organisers 
were keen to assess the success of the evening. 
Having decided to obtain information from those 
attending the disco, they were undecided whether 
to use a census or a sample survey. 
Which method would you recommend them to 
use? 
Give one advantage and one disadvantage 
associated with your recommendation. (L) 


2. A school of 1000 pupils is divided into year 
groups as follows 


Year Number of pupils 
7 150 
8 150 
9 150. 
10. 150 
V1 150 
2 125 
13 : 125 


A survey is to be carried out and a committee 
representative of the school is to be formed 
consisting of 40 pupils. 

It is decided that stratified sampling should be 
used. 


(a) Calculate the number of pupils chosen from 
each year group. 

(b)’ Explain how to choose the pupils from 
Year 7. 


(a) Explain briefly 
(i) why it is often desirable to take 
samples, 
(ii) what you understand by a sampling 
frame. 


(b) State two circumstances when you would 
consider using 
(i) clustering, 
{ii} stratification, 


when sampling from a population. 


(c) Give two advantages and two disadvantages 
associated with quota sampling (L) 


4, (a) A television company wishes to estimate the 
popularity of a particular television serics by 
strcet interviews. Describe how the method 
of quota sampling might be used for this 
investigation. 


5. 


(b) A meat canning factory supplies a 
supermarket with cans of meat in three 
sizes: large, medium and small. 

The regular consignment is of 300 large 
cans, 500 medium cans and 400 small cans. 
Describe how the supermarket could apply 
the method of stratified random sampling to 
a sample of 60 cans to test the quality of 
these goods. 


Write brief notes on 


(a) simple random sampling, 
{b) quota sampling. 


Your notes should include a description of each 
method, and an advantage and a disadvantage 
associated with it. (L) 


Ina school year group of 140 pupils there are 60 
girls and 80 boys. A survey is to be taken to find 

methods to improve the school’s meal services. A 
sample of 14 members of this group is needed for 
the survey. 
The school decides to use one of the following 
methods to obtain the names of pupils for the 
sample: 


A: Every tenth name on the year group register is 
selected for the sample. 

B: Each of the 140 names is allocated a different 
number from 1 to 140 inclusive; the school’s 
computer then picks 14 different random 
numbers between 1 and 140 inclusive 


(a) State briefly one advantage and one 
disadvantage of each method. 

(b) Explain what is meant by a stratified 
random sample and describe how method B 
could be changed to give a stratified random 
sample. 


Explain briefly the difference between a census 
and sample, and give two reasons why a sample 
may be preferred to a census. 

Explain the meaning and purpose of a sampling 
frame in random sampling. 

It is required to obtain the views of the pupils of 
a school about the school magazine. It is decided 
to do this by means of a small panel of pupils. 
Describe briefly how you would select such a 
panel using 


(a) simple random sampling, 
(b) stratified random sampling. 


State, with reason, which of these two sampling 
methods you consider to be the more approp! iat 
for this situation. (AEB) 


re 
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8. A research study into the use of hormone 
replacement therapy for women in the United 
Kingdom involved in a survey of women in three 
general medical practices in Greater London. The 
designer of the survey describes his method of 
obtaining his sample as follows. 

‘I obtained the names and addresses of 5025 
women aged between 45 and 65 from the 
practices’ age-sex registers. The women were 
sent a questionnaire that asked whether they had 
received hormone replacement therapy’ 
Source: British Medical Journal 
December 1989 


(a) Suggest one advantage and one disadvantage 
of this sampling method. 

(b) Of the 5025 women contacted, 3238 
returned a completed questionnaire, and 
330 of these had received hormone 
replacement therapy. Given that there are 
about 703 000 women in the 45~65 age 
group living in Greater London, obtain an 
estimate for the number of 45-65 year-old 
women in Greater London who have 
received hormone replacement therapy. 
With reference to the sampling method used, 
comment on the reliability of this estimate. 

(c) Suggest an alternative method of obtaining 
such an estimate. (NEAB) 


SIMULATING RANDOM SAMPLES FROM GIVEN DISTRIBUTIONS 


A good way to simulate a random sample from a given distribution is to use cumulative 
proportional frequencies or cumulative probabilities, as illustrated in the following examples 


(a) From a frequency distribution 
Example 9.7 


Use the sequence of random digits 364294 588330 923918 400300 to generate five simulated 
observations from the following frequency distribution. 


6 Total 40 


Solution 9.7 


ee first the cumulative frequencies and then transfer them’ to cumulative proportional 
Tequencies with a total proportion of 1. Then allocate the random numbers in a convenient 
way in accordance with the cumulative proportional frequencies. 


Cumulative 
proportional 
frequency 


Cumulative 
x f frequency 
1 8 8 
2 12 20 
3 14 34 
4 6 40 


Corresponding 
random 
numbers 

0.20 O1-to 20 
0.50 21-to 50. 
0.85 S1 to: 85 
: 86 to:99-and 00. 


432 At 


Since the cumulative proportional frequencies contain two decimal places, it is pas to 
use two-digit random numbers. Note that 00 has been allocated to the x-value of 4 for 


convenience. 
Take § two-digit random numbers from the list: 36, 42, 94, 58, 83 
Match these up with the corresponding sample values: 23 25 4, 35 3 


So a random sample of size 5 from the given distribution is 2, 2, 3, 3, 4. 


secon oS 
ore eenctingrenmuscacrs 


(b) From a probability distribution 


Example 9.8 . 
Generate a random sample size 10 from the given probability distribution, using the random 
numbers3 7 4 7 6 5 3 3 9 0. 


Solution 9.8 
Form the cumulative distribution function F(x) and then allocate random numbers in a 
convenient way. 


Corresponding 
x P(X =x) F(x) random numbers 
0 0.1 0.1 1 
1 0.2 0.3 2,3 
2 0.4 0:7. 4,55, 6,7 
3 0:3 1 89,0 


Take the 10 random numbers given and convert them to sample values: 


374765339 0 
4 202.252, 202, 2 33 


Random number 
Sample values 


So the sample values are 1, 1, 1, 2, 2, 2, 2, 2, 3, 3. 


crt EE TOS LES ES ™ 
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Example 9.9 oon 
Generate a random sample of size 4 from the binomial distribution X ~ B(4, 0.2), using the 
random numbers 2811 5747 6157 8988. 


Solution 9.9 


Calculate the cumulative probabilities, either by calculating probabilities first or using 
cumulative probability tables directly (sce page 682) 


Remember that P(X = x) = *C,0.84*0.27 for x = 0,1,2,3,4. 


Corresponding random 
x P(X =x) F(x) numbers 
0 0.84 = 0.4096. 0.4096. 0001 to. 4096 
1 4x 0.83 0.2. = 0.4096 0.8192 4097 to 8192 
2 6x 0.87 x 0.2? = 0.1536 0.9728 8193 to 9728 
3 4x 0.8x 0.2? = 0.0256 0.9984 9729 to: 9984 
4 0.24 = 0.0016 1 9985 to. 9999-and 0000 


The random number 2811 is in the range 0001 to 4096 and so corresponds to x = 0. 


Similarly 5747 corresponds to x =1 
6157 corresponds to x = 1 
8988 corresponds to x = 2 


So the random sample of four observations from the binomial distribution consists of the 
values 0, 1 1, 2. 


Example 9.10 


Using the random number 8135 take a single random observation from a Poisson distribution 
with parameter 3. 


Solution 9.10 
X ~ Po(3). 


Using cumulative Poisson probability tables (see page 648) and arranging the results in a table 
together with a convenient corresponding random number allocation gives: 


Corresponding random 
x F(x) numbers 
0 0.0498 “=. Q001 to 0498 
1 0.1991 0499 to'1991 
2 0.4232 1992:to 4232 
3 0.6472. 4233 to 6472 
4 0.8153 6473 to 8153 
3 0.9161 8154 to. 9161 
6 0.9665 9162. to: 9665 
Z 0.9881 9666 to 9881 
8 or over 1 9882, t0-9999:-and 0000 


The given random number $135 is in the range 6473 to 8153, so the random observation 
corresponds to x = 4. 
an 
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Example 9.11 


Using the random digits 723 850 take a random sample of size 2 from the continuous 
distribution with probability density function 


fase for 0 <x<2 


Solution 9.11 
The cumulative distribution function is given by 


Fx) -| aatde 


0 


Taking the first three random numbers: 
if F(x) = 0.723, then 


a 
* 0.723 
8 


and x =48x0.723=1.80 (2 d.p.) 
Taking the next three random numbers: 
if F(x) = 0.8.50, then 


3 
x 
—=0,850 

8 


and x =8x0.850=1.89 (2 d.p.) 


So the two random observations are x = 1.80 and x = 1.89. 


Example 9.12 a 
Use the random numbers 382 824 to take a random sample of 2 from the normal distribution 


N(30, 4). 


Solution 9.12 


~ N(30, 4). . 
Canes De eaiiiiee ®(z) are given in the standard normal tables (see page 649) 


Taking the first three digits of the random number list 
@(z) = 0.382 
z= 0"(0.382) 
=-0.3 


(z) = 0.382 


Now take the second three digits 


© = 0.824 
z=01(0,824) 
=0.931 
05534 
2 
x=3041,862 


=31.9 (1d.p,) 


o(2) = 0.824 


i 
of 
i 
1 
i 
I 
i 
+ 


T 
x: 0 0.931 
Zz; 30 31.862 


So the two random observations are 29.4 and 31.9, 


Exercise 9b Simulating random samp 


In the following, use the random number tables on 
page 653 if random numbers have not been given in 
the question. 


1. 


Select a random sample of size 10 (to 3 d.p.) 
from the continuous range 3<x<9, 


. Draw up a random sample of 100 numbers from 


the discrete integer range 0 to 9. Find the mean 
and variance of the sample values and compare 
them with the theoretical mean and variance, 


The discrete random variable X has probability 
distribution 


x 


J 6 7 8 9 


P(X =x) O45. 0.2° 0.330.240" 0.11 


Simulate a sample of size 12 from the 
distribution of X. Compare the mean and 
variance of this sample with E(X) and Var(X). 


- The discrete random variable X has distribution 


function F(x) = 4@-2),2= 3, 4, 5, 6. Using 
random number tables, generate 10 observations 
of X, showing your working clearly, 

Describe how you would select a random sample 
of 30 pupils from a school containing 850 pupils. 


You wish to select a person at random from a 
group of $8 people. The following procedure is 
suggested: 

Allocate the numbers 1 to 58 to the people. 
Choose a line in a table of random numbers and 
call the first two digits x and y. Let z= 10x + y. If 
1<z <58 then the person who was allocated the 
number is selected. Otherwise, the person 
allocated the number z — 58 is selected, 

Comment on this method of selection. 


6. 


es from given distributions 


Take a random sample of size 6 from the 
distribution: 


x 1S. 16 17 18 19 
13 15 12 6 4 


7. Take a random sample of size 3 from the 


distribution: 


23 2.4 25 2.6027 
40°. 60 90 50. 60 


8 


10. 


AT: 


» Take a random sample of size 10 from each of 


the following probability distributions. In each 
case, find the sample mean and variance and 
compare with E(X) and Var(X). 


a) PSI 4 


P(X =x): O14 0.200 0.450" 0.24 


(b) P(X =x)=kx, x= 0,1, 2,3. 


. Take a random sample of size 5 from the 


distribution of X where F(x) = tx, x = 2, 3,4, 5. 


(a) The discrete random variable X is such that 
X ~ B (3, 0.4). Take a random sample of 
size 5 from this distribution, using the 
random numbers 


407 315 401 203 972 


(b) Using the random number 6143 take a 
single random observation from the Poisson 
distribution with parameter 4, 


Using the random numbers 267 394 018 take a 
random sample of size 3 from the normal 
distribution with mean 35 and variance 9. 


436 4 


12. Using the random numbers 2654 9342, make 
two random observations from each of the 
following distributions: 


(a) The number of seeds that germinate in a 
group of 5 selected at random, given that 
75% are expected to germinate. 

(b) The number of goals in a football match, 
where the number of goals follows a Poisson 
distribution with variance 2.4. 

(c) The mass of a bag of sugar, where the mass 
is normally distributed with mean 1010 g 
and standard deviation 4.5 g. 


13. Using the random number 256 construct a 
random observation of the continuous random 
variable X where 


F(x)= 4x", 0x <3, 


14, Take 20 samples, each of size 2, from the 
following distribution: 


x 1 2 3 4 3 
F 10 1S 25 35 15 


Calculate the mean of each sample and find the 


mean and variance of the sample means. Find the 


mean and variance of the original distribution. 
Comment. 


SAMPLE STATISTICS 


15. 


16 


You are given the random number 431. Use this 
number to obtain a sample observation from 


(a) a binomial distribution with = 12 and 
p=0.4. 

(b) a normal distribution with mean 6.2 and 
standard deviation 0.1. 


You are expected to explain clearly how you 
obtain the sample observations. (O) 


The digits 8453276 are obtained from a table of 
random digits. Use them to obtain a random 
observation from each of the following 
distributions: 

(a) the number of the winning ticket in a lottery 
in which there are 500 ticket numbers from 
14 to 500 and every ticket has the same 
chance of being selected. 

(b) the number of babies born in a cottage 
hospital in a week, assuming that on average 
one baby is born every three days and that 
births are independent (and ignoring the 
possibility of multiple births). (O) 


When you are trying to find out information about a population it seems sensible to take 
random samples and then consider the values obtained from them. It is therefore useful to 


know how these sample values are distributed. 


THE DISTRIBUTION OF THE SAMPLE MEAN 


Imagine carrying out the following procedure: 


@ Take a random sample of 2 independent observations from a population. Note that from a 


finite population, sampling should be wit 


independent. 


Calculate the mean of these 7 sample values. This is known as the sample mean. 
e Now repeat the procedure until you have taken all possible samples of size 7, 


the sample mean of each one. 


e Forma distribution of all the sample means. 


The distribution that would be formed is called the sampling distribution of means. 


h replacement to ensure that the observations are 


calculating 


\ 


EIS S~S!S'S 


SAMPLING ANE 


The mean and variance of the sampling distribution of means 


It is possible to work out the mean and variance 


expectation algebra. of this sampling distributio i 
§' n using 


Consider a population X in whi 
; which E(X) = and V. =o 
Take # independent observations X 1X2, : ‘i x fa : om 


Since E(X) =", 
E(X,) Fh, E(X,) Eby very E(X,) =H 
Since Var(X) = 07, 


Var(X,) =07, Var(X,)=07, ..., Var(X,) = 0? 
The sample mean, 


x Xt Xt +X, 
nt 
HBX te Xt tox, 
E(X)=E|1x, +4 1 
(=H(2x, +2 x, 442x 
nh n n ” 


=p! 1 1 
=#{xi}+2( Pas) +ee(2 x, 


ot 1 1 
a EQ) +— E(Xy) ++ + E(K,) sing Hex) = a8(X) 


{page 246. 
page 246) 


re 1 
= Utter t— 
Pins aie +TH 
1 
=nx—U 
n 
=u 


> 1 
Van(X) = Var(™ Matt] 
n n n ” 


u 1 
=Var{ X,) +Var(- %} + aa +Var( Tx 
not 


{2 : 1\/? 4\2 
= ;| varex + (7) Vac) + +f) Var(X,) asing 


n 
1 1 
= 2 1 
= 0 +507 + 4g? 
n n nm 
Lig 
nm 
oe 
n 


EiX)=y” and 


Var(X} 
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The standard deviation of the sampling distribution is .{—, usually written na This is 
n n 


known as the standard error of the mean. 
The mean of the sampling distribution is the same as the mean of the population. 
The standard deviation of the sampling distribution is much smaller than that of the 


population since o” has been divided by . This implies that the sample means are much more 
clustered around u# than the population values are. In fact, the larger the sample size, the more 


clustered they are. 


The following diagrams help to illustrate the shape of the sampling distribution of means 
resulting from different sized samples from given populations. 


(a) The distribution of X when the population of X is normal 
Distriubtion of X when X ~ N (100, 64) 


—_ tO 


100 


Distribution of X when m= 2, S and 25. 


| 
nay ! 
' | 
H \ 
\ 
+ ! a 


100 100 


_ 

100 

Means of samples 
of size 25 


Means of samples 


Mcans of samples 
of size 5 


of size 2 
From the diagrams, you can see that if samples are taken from a normal population, the 
sampling distribution of means is normal for any sample size. 


If X ~ NU, 07) then X ~ Nig, — | 
ae! 
Example 9.13 
At a college the masses of the male students can be modelled by a normal distribution with 


mean mass 70 kg and standard deviation 5S kg. 
Four male students are chosen at random. Find the probability that their mean mass is less 


than 65 kg. 


D ESTIMATION 


Solution 9.13 


X is the mass, in kilograms, of a male st 
. ‘ udent at the coll 
and X ~ N(u, 0”), with w= 70 ando=S. vata 


Since the distribution of X is normal, the distribution of X is also normal and 


‘=e o 
oe nfu -] with “= 70, 07 =25, n=4. 
ie. X ~ (70 a 


so X ~ N(70, 6.25) 


P(X < 65) = r(z ae =] 
V6.25 
=(Z <~-2) 
=1-6(2) 
=1-0.9772 7 —= a 
= 0.0228 . eh 


The probability that the mean mass is less than 65 kg is 0.023 (2 s.f.). 


The diagram below shows the distributions of X and X drawn to scale. 


X ~ N(70, 6.25) 


Example 9.14 


The distribution of the random vari i 

P able X is N (25, 340). The mean of a rand 

size n drawn from this distribution is X. Find the value of 2, correct to ae ‘i aed a 
given that P(X > 28) is approximately 0.005. oe oc 


Solution 9.14 
X ~ N(25, 340) 


For samples of size n, X ~ nfs, = 
n 


28-25 


P(X > 28)=P/Z > 


2 
4 


r(z > a 


You are given that P(X > 28) =0.005, (ii) Distribution of X when X ~ Po (4) 
PIZ> avn 0.005 
so =0. Pp 
340 
3Na 
plz <—£|=1-0.005 = 0.995 02 ; 
340 1 / | 
1 i 
3Nn | i | i 
= 070.995 0.005 0.1 i i i ' 
340 ‘ | | zz 
= 2.576 - i | } | i. 
x: 25 28 i i | i ; 
Nin = 2.576 x V340 z 0 2.876 —a : — i i 
x 
Squaring both sides | 7 ‘ 8 2 10 
On = 2.5767 x 340 j on _ 
n= 250.68 ..« Distribution of X for samples of size 10, 15 and 30 
so n=250 (2s.£.). 
(b) The distribution of X when X is not normally distributed 
The following diagrams illustrate the distribution of X for samples of different sizes taken | I 4 8: 
from a population X: Means of samples 3 4 5 i 
i istributi j oae0 Means of sampl 
(i) Distribution of X when X ~ B (10, 0.25) | ns: of samples 
po i 3 q 5 
Means of samples 
0.3 of size 30 
0.2 (iii) Distribution of X when X ~ R (3, 7) 
O.1 p 
| ! f < 0.25 
0 1 2 3 4 bs 6 7 8 9 10 * : i ay 
= | 1 
Distribution of X for samples of size 10, 15 and 30 \ 
| 
1 
0 1 2 3 4 5 6 $ x 


Distribution of X for samples of size 10, 15 and 30 


wal —_— 
iT 2 3 a 2 3 a 
Means of samples Means of samples 
of size 10 of size 15 2 3 H 
Means of samples fuses 
of size 30 2 ESES 4 5 6 —— 
S Means of samples 4 5 6 
of size 10 Means of samples cd ES 
of size 15 4 5 6 


Means of samples 
of size 30 
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CENTRAL LIMIT THEOREM 


From the diagrams you can see that when samples are taken from a population that is not 
normally distributed, the sampling distribution takes on the characteristic normal shape as the 
sample size increases. For large the distribution of the sample mean is approximately 


normal. 


This result is known as the central limit theorem. It is somewhat surprising, since it holds i 
when the population of X is discrete (as in the binomial and Poisson distributions) and when 


X is continuous (as in the uniform distribution). 


For samples taken from a non-normal population with mean # and variance 0°, 
by the central limit theorem, X is approximately normal 


and 


provided that the sample size, 1, is large (7 > 30 say). 


Example 9.15 


Thirty random observations are taken from each of the following distributions and the sample 
mean calculated. Find, in each case, the probability that the sample mean exceeds S. 


(a) X is the number of telephone calls made in an evening to a counselling service, where 
X ~ Pol4.5). 

(b) X is the number of heads obtained when an unbiased coin is tossed nine times. 

(c) X is distributed uniformly throughout the range 2<x<7. 


Solution 9.15 


(a) X ~ Po(4.5) 
w=h=45, 8 =A=45 


By the central limit theorem, since 7 is 
2 


so X~ {es “) with #7 = 30 
n 


large, X is approximately normal, 


ae 4.5 
ie, X ~ N45, —— 
30 


X ~ N(4.5, 0.15) 


se 545 
HR> sy (2> 


1 
VO.15 : 
=P(Z > 1.291) 
= 1 &(1.291) - ae 
=1-0.9017 z ee 
= 0.098 (2 s.f.} Z 0 61.291 


(b) X ~ B(9, 0.5) 
weanp=9x0.5=4.5 
o? =npg=9 x 0.5 x 0.5 =2.25 


(c) 


By the central limit theorem, since 1 is large, X is approximately normal and 
2 


X ~ nfo *] with 2 = 30 


incense 
30 


X ~ N(4.5,0.075) 
P(X > poo(z gina 
V0.075 
= P(Z> 1.826) 
=1- 0.9660 


hun 


= 0.034 (2 s.f.) 

When X is uniformly distributed and a < x < b, 

E(X) = }(a + 6) and Var(X) = }(b-a)? (see page 363) 
Since X is valid for 2<x<7,a=2 and b=7 

ws E(X)=4(2+7)=4.5 o? = Var(X) = (7 - 2° =% 


By the central limit theorem, since 1 is large, X is approximately normal and 
2 


X- nu | with 7 = 30 


30 
X ~ N(4.5, 0.0694...) . 
PR> syar(z > oar 
V0.0694... 
= P(Z > 1.897) 1 i 
=1-0.9711 aa a a 


=0.029 (2s.f.) z: é 0 1.897 


. The volumes of wine in bottles are normally 


iS 


Exercise 9c The distribution of the sample mean, X, for 


‘he v ‘ (a) A random sample containing 50 sunflo’ 
eS with a mean of 758 ml and a is taken and the mean height calcalated. 
standars deviation of 12 ml. A random sample What is the probability that the sam, le 
= as is taken and the mean volume mean lies between 195 cm and 205 ay, 
‘oun i 
: - (b) A hundred such i 
pera a probability that the sample mean is obeeivations: aie: alee Fai eam of 
an ml. these would you expect the sample mean to 


F b 
The heights of a new variety of sunflower can be presi than 210 cme 


modelled by a normal distribution with mean 
2 m and standard deviation of 40 cm. 
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10. 


In an examination taken by a large number of 
students the mean mark was 64.5 and the 
variance was 64, The mean mark in a random 
sample of 100 scripts is denoted by X. Find 


(a) P(X > 65.5) 
(b) P(63.8 < X < 64.5) 


The mean of 50 observations of X, where 

X ~ B(12, 0.4), is X. 

(a) State the approximate distribution of  & 
(b) Hence find P(X < 5) 


A normal variable X has standard deviation a, 

The mean of 20 independent observations of X 

is X. 

(a) Given that Var(X) = 3.2, find the value of 0, 

(b) Would your answer be different if the 
variable was not normal? 


. Independent observations are taken from a 


normal distribution with mean 30 and 
variance 5. 


(a) Find the probability that the average of 10 
observations exceeds 30.5. 

(b) Find the probability that the average of 40 
observations exceeds 30.5. 

(c) Find the probability that the average of 100 
observations exceeds 30.5. 

(d) Find the least value of 7 such that the 
probability that the average of n 
observations exceeds 30.5 is less than 1%. 


. The standard deviation of the masses of articles 


in a large population is 4.55 kg. Random 
samples of size 100 are drawn from the 
population. Find the probability that a sample 
mean will differ from the population mean by 
less than 0.8 kg. 


. The variable X is such that X ~ N(u, 4). 


A random sample of size is taken from the 
population. Find the least such that 
PIX —p| < 0.5) > 0.95. 


. (a) A large number of random samples of size 7 


are taken from B(20, 0.2). Approximately 90% 

of the sample means are less than 4.354. 

Estimate 7, 

(b) A large number of random samples of size ” 
are taken from Po(2.9). Approximately 1% 
of the sample means are greater than 3.41. 
Estimate n. 


The random variable X has standard deviation o. 
The mean of 40 observations of X is X. Given 
that Var(X) = 0.625, find the value of a. 


11. 


12. 


13. 


14, 


15. 


16. 


The mean of a sample of 100 observations of the 
random variable X is denoted by X. The mean of 
X is 20 and the standard deviation of X is 0.3. 
Find the mean and the standard deviation of X. 


A sample of n independent observations is taken 
from a normal population with mean 74 and 
standard deviation 6. The sample mean is 
denoted by X. 


(a) Find n if P(X > 75) = 0.282. 
(b) Find n if P(X < 70.4) = 0.0037. 


To estimate the mean and standard deviation of 
the life of a certain brand of car tyre a large 
number of random samples of size 50 were 
tested. The mean and standard error of the 
sampling distribution obtained were 20 500 km 
and 250 km respectively. Estimate the mean and 
standard deviation of the life of this brand of car 
tyre. Explain what part the use of the central 
limit theorem has played in the calculations. 


The diameters, x, of 110 steel rods were 
measured in centimetres and the results were 
summarised as follows: 


Yx=36.5, Lx*=12.49, 


Find the mean and standard deviation of these 
measurements, 

Assuming these measurements are a sample from 
a normal distribution with this mean and this 
variance, find the probability that the mean 
diameter of a sample of size 110 is greater than 
0.345 cm. (O@C) 


In a certain nation, men have heights distributed 
normally with mean 1.70 m and standard 
deviation 10 cm. Find the probability that a man 
chosen randomly has height not less than 

1.83 m. 

What is the probability that the average height of 
3 men chosen randomly is greater than 1.78 m 
and the probability that all three will have 
heights greater than 1.83 m? (MET) 


Two red balls and 2 white balls are placed in a 

bag. Balls are drawn one by one, at random and 

without replacement. The random variable X is 

the number of white balls drawn before the first 

red ball is drawn. 

(a) Show that P(X = 1) =, and find the rest of 
the probability distribution of X. 

(b) Find E(X) and show that Var(X) = 3. 

(c) The sample mean for 80 independent 
observations of X is denoted by X. Using 4 
suitable approximation, find P(X > O78) oy 


THE DISTRIBUTION OF THE SAMPLE PROPORTION, p 


Suppose a random sample of observations is taken from a population in which the 
proportion of successes is p and the proportion of failures is qz=1—p 


If X is the number of successes in the samp] i i 
ple, then X follows a binomial distribution i 
X ~ B(n, p) and E(X) = np, Var(X) = ngp (see page 286) eR, 


The random variable for the proportion of success in the sample is Z , 


This can be written P, where P, = & nes x 


non 


It is possible to work out the mean and the variance of P, using expectation algebra as follows: 


1 
E(P,.)= ae x} Var(P) = Var(t x) 


n 


1 1 2 
= ECO -(7 x Var(X) 
cee Es 
n ~ y2 npq 
=p Pd 
n 


The distribution of P, is known as the sampling distribution of proportions. The standard 


deviation of this distribution is Pq and it is known as the stai 


= ndard error of proportion. 


NOTE: When considering the normal approximation to the binomial distribution, a 
continuity correction of +3 is needed (see page 383) ; 


: 1 1 1 
Since i arog: X, use a continuity correction — x | + 4 ie. + A, 
n 2n 


Example 9.16 


ae known that 3% of frozen pies delivered to a canteen are broken. What is the probability 
at, on a morning when 500 pies are delivered, 5% or more are broken? 


Solution 9.16 


Let p be the probability that a pie is broken, so p = 0.03. 


Let P. be the proportion of pies in the sample that are broken. 


Then P, ~ nlp m2) with p = 0.03, q = 0.97, n= 500. 
° n 


500 
ie. P, ~ N(0.03, 0.000 058 2) 


0.03 x 0.97 
P, ~ (0.0, — 


To find the probability that 5% or more are broken, 


find P(P, 2 0.05) — P| P, > 0.05 - Ix 500] {continuity co 
= P(P, > 0,049} ' 
0.049 — 0,03 \ 
=P|Z > ———— \ 
V0.000 058 2 | 
=P(Z> 2.491) 
=1- (2.491) |_fs 
0.03 0.0: 
Z ; ee 7 0 2,491 


Alternative method for Solution 9.16 


Instead of considering p, the proportion of broken pies, you could consider X, the number of 
broken pies in the sample. - 

In this case, X~ B(n, p) with 2 = 500, p = 0.03, g=1-p=0.97. 

Now np = 500 x 0.03 = 15 and ng = 500 x 0.97 = 485. 


Since 7 is large such that mp > 5 and nq > 5, use the normal approximation for the binomial 


distribution (see page 382), 7 
where X ~ N(#p, npq) with np = 15 and npq = 500 x 0.03 x 0.97 = 14.55 


ie, X ~N(15, 14.55). 


ili broken. 
You want the probability that 5% or more are 
5% of 500 = 25, so find the probability that 25 or more are broken. 


ty correction} 


P(X > 25) P(X > 24.5) {conta 


plz 24.5 -15 
= —_—_—— 
V14.55 


1 
1 
| 
| 
| 
| 
i 


- 
=P(Z>2.491) x: 15 (24.5 
=0.0064 (as above) Zz: Oo 2.491 


NOTE: Since the same underlying theory has been used, probabilities of - Rhee 
found either by considering P,, the distribution of sample eae or by ee he 
the distribution of the number of successes, and applying the normal approxi 

binomial distribution. In either case, the sample size, 2, must be large. 


i i i i es, the 
Note that if the continuity correction is used in both cases, or omitted in both cases, 
standardized z values will agree exactly. 


Exercise 9d Distribution of sample proportions (large samples) 


1. 2% of the trees in a plantation are known to 
have a certain disease. A random sample of 300 
trees is checked. Find the probability that the 
Proportion of diseased trees in the sample is (b) Find the corresponding probability if the 
(a) less than 1%, sample consists of 1000 randomly sclected 
(b) more than 4%. voters. 


(a) Find the probability that a poll of 100 
randomly selected voters would show over 
50% in favour of Mr Hand. 


sat Base ; 3 5. Three-quarters of the households in a particular 
2A Be bility ich 150 times. Find the area are connected to the internet. Find the 
Probability that probability that at least 73 of a random sample 


{a) fewer than 40% of the tosses will result in of 100 households are connected to the internet, 


heads, 
(b) ewes 40% and 50% (inclusive) of the 6. A die is biased so that 1 in 5 throws results in a 
tosses will result in heads, six, Find the probability that, when the die is 
(c) at least 55% of the tosses will result in thrown 300 times, the number that result in a six 


heads. {a} is more than 70, 
(b) is at least 70, 
3. A fair coin is tossed 300 times. (c) is less than $7. 
Work through part (c) as in question 2. . 
Explain why your answer is different from that 7. 70% of the strawberry plants of a particular 


obtained in question 2. variety produce more than ten strawberries per 


plant. Find the probability that a random sample 
of $0 plants of this variety consists of more 

than 37 plants which produce more than ten 
strawberries per plant. 


4, Mr Hand gained 48% of the votes in the District 
Council elections. 


UNBIASED ESTIMATES OF POPULATION PARAMETERS 


In order to define a binomial distribution you need to know # and p; to define a Poisson 
distribution you need to know 4 and to define a normal distribution you need to know x 
and o. These are known as the population parameters of the distributions. 


Suppose that you do not know the value of a particular parameter of a distribution, for 
example the mean or the variance or the proportion of successes. It seems sensible that you 
would take a random sample from the distribution and use it in some. way to make an 
estimate of the value of your unknown parameter. 


This estimate is unbiased if the average (or expectation) of a large number of values taken in 
the same way is the true value of the parameter. There may be several ways of obtaining an 
unbiased estimate but the best (most efficient) estimate is the one with the smallest variance, 


POINT ESTIMATES 


If the random sample taken is of size n, 


® the best unbiased estimate of p, the proportion of successes in the population, is 6 where 
P=p, ole 


p, is the proportion of successes in the 


i 


the best unbiased estimate of H, the population mean, is @ where 
yy 


of the sample 


© the best unbiased estimate of o?, the population variance, is 62 where 
t 


s° is the variance of the 


VIRSE IM &-LEVEL STATIS 


There are alternative formats for &: 
nS @we-R?  Ela@-ky 
a-1 Wt a-l 


a (Saxe (Se 2 fe 
aif \ \ : (Ex? 


eres ies ; \ n | jo u-4 . 


NOTE: that if you are using your calculator in SD mode, 
6 directly. Look for a key marked [%,,_,}. On some models this is obtained by pressing 


SHIFT] [3]. Find the key on your model. 


it is possible to find the value of 


Example 9.17 
n journeys and records the number of minutes, x, to the 


A railway enthusiast simulates trai 
nearest minute, trains are late according to the schedule being used. A random sample of 


50 journeys gave the following times. 
17 5 3 10 4 3 10 5 2 14 
3 14 5 5 2 9 22 36 14 34 
22 423 6 8 15 41 23 13° «7 
6 13 33 8 5S 34 26 17. 8 43 
24 14 23 4:19 S23 13 12 10 
Given that x = 738 and ¥x? = 16 526, calculate to two decimal places, unbiased estimates of 
the mean and the variance of the population from which this sample was drawn. (L) 


Solution 9.17 


X ig the number of minutes that the train is late. 
Let E(X) = and Var(X) = o. 


Unbiased estimate of 


sete ee td 76 
Oe ae Sees 
Unbiased estimate of 0” 
ah fo Zar 
n-1 n 
1 38)? 
=— 16 526-2 ») 
49 50 
= 114.961... 


=114.96 (2 d.p.) 


SAMPLING AND E 


Try this using the raw data and i 
° 1 
pe your calculator in SD mode (see page 40). Input the data as 
Casio 570W/85W/85WA 
Set SD: mode. MODE] MODE} } 1 }:or-;MODE| | 2 
Clear memories SHIFT; |Scl} |= 
Input data 17} |DT 
5||DT 
3] |DT 
0} |DT 
To obtain 
a2 
67 = 114.961 2. SHIFT} [3] [x?] [=] 
X=14.76 SHIFT] | 1) |= 
You can also check 
ae 738 B 
Ex?= 16 526 A 
n=50 RCL] |C 
To clear SD mode MODE) 
ES | 
Example 9.18 


F 4 a 
‘or the data given in Example 9 17, estimate the proportion of trains that are more than 


Solution 9.18 


Number in sample that are more than 25 minutes late = 7 
Proportion in sample, p,=% = 0.14 


Unbiased estimate of population proportion, p, is 6, where p = p, = 0.14 
.= 0.14, 


INTERVAL ESTIMATES 
Another way of usi i 
ay of using a sample value to give a good idea of i 
parameter is to construct an interval, known as a Sree pan 


Th gen is i i 
Ps oor i is an interval that has a specified probability of including the param 
bite is nate y written (a, b) and the end-values, a and 6, are known as confi n ie 
. probabilities most often used in confidence intervals are 90%, 95% and 99 %. 
7 . 


Suppose Ww wi w 
you d i 
4 i a not = the mean #t of a particular population and you want to work out a 
95% confidence interval for it. You would need to construct an interval (a, b) so that 
3 


Pla << b)=0.95. 


In thi ili 
is case, the probability that the interval includes ys is 0.95 or 95% 
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COURSE IN A-LEVE 


The interval that you construct uses the value of the mean of a random sample of size n taken 
from the population. This mean is denoted by %. 


Before constructing your interval for y, it is essential to ask the following questions. 


— Is the distribution of the population normal or not? 

- Do you know the variance of the population? 

— Is the sample size large or small? 

Your answers will then determine how to proceed. The following theory illustrates various 
situations. 


(a) Confidence interval for , the population mean 


e@ of a normal population, 
@ with known variance o? 
@ using any size sample, n large or small 


Consider first how to calculate the end-values of the most commonly used interval, the 
95% confidence interval. The method can then be adapted for other levels of confidence. 


Note that it is useful to be able to follow the theory for the derivation of the end-points, but in 
practice you will probably only need to be able to apply the formula. 


As you saw on page 438, for random samples of size n, 


2 
ifX ~N(0?), then X ~ nf | 


ae 
Standardising, Z picet , where Z ~ N(0, 1) 
afNn 


Consider the distribution of Z. 
For a 95% confidence interval you need to find the values of z between which the central 
95% of the distribution lies. This means that the upper tail probability is 0.025 and the lower 
tail is 0.975. 
P(Z < 2) =0.975 

z= 1(0.975) 

= 1.96 

The values of z are £1.96. 
So P(-1.96 < Z < 1.96) = 0.95 


X-pu 


i 
I 
t 
1 
i 
1 
i 
+ 
0 


< 1.96]=0,95 “Lae 


ie, Pj-1.96 < 1.96 
ofNn 
Now consider the inequality in two parts: 
Xx X-u 
X- t 
-1.96 <=" —~ < 1.96 
e eo 
oo X-n < 1.96 = 
mAb He is 
Vn , 
u< X4+1.96— Ra ria 


Vn 


AND ESTIMATION 451 


Writing these two inequalities in one statement gives 


—_ oO => Oo 
X-196 — << X+1.96 
Va Vn 


Therefore the probability statement is 
o 
Vn 


This enables you to construct the 95% confidence interval for iT 


Comparing this with P(a <j < b) = 0.95, if the mean obtained from your sample is X, then the 
end-values, or confidence limits are 


PIX-1.96-2 <p < X 41,96 <\-0,95 
Vn 


oO oO 
®-1.96 — and *+1.96—. 
Vn Vn 


: : o 
These are sometimes written ¥ + 1.96 —. 


Vn 
If X is the mean of a random sample of any size » taken from a normal population with 
known variance 07, 
then a 95% confidence interval for w is given by 


fe. 3 0 o\ 
18 - 1.96 =, 841.96 ze 
Vn Va} 


{ 
i 


Example 9.19 


The mass of vitamin E in a capsule manufactured by a certain drug company is normally 
distributed with standard deviation 0.042 mg. A random sample of five capsules was analysed 
and the mean mass of vitamin E was found to be 5.12 mg. Calculate a symmetric 95% 
confidence interval for the population mean mass of vitamin E per capsule. Give the values of 
the end-points of the interval correct to three significant figures. {C) 


Solution 9.19 


X is the mass, in milligrams of a vitamin E capsule. 
X ~ Nu, 0) with o = 0.042. 


aily distributed so any size random sample ig acce 


g o 
XN [wt] withn=5. Xisno 
n 


The 95% confidence interval for uz is | — 1.96 as » ¥+1.96 ae } 
Vin Vin 
0.042 


xt 


oO 
1.96 —=5.12 + 1.96 x 
Vn 


= 5.12 + 0.0368... 
Store this valuc in your calculator 


Lower confidence limit = 5.12 — 0.0368 ... = 5.08 (3 s.f.) 
Upper confidence limit = 5.12 + 0.0368 ... 5.16 (3 s.f.) 


So the 95% confidence interval for #, based on the sample mean, is (5.08 mg, 5.16 mg). 
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NOTE: The probability that the interval (5.08 mg, 5.16 mg) includes, or has trapped, x is 
0.95, i.e. 95%. If you took another random sample of the same size, you would probably get 
a different interval. If you took lots of samples in a similar way then, on average, 95% of 
these intervals would include the true population mean y. 


The following computer simulation illustrates the intervals obtained when 100 confidence 
intervals are constructed, each with 95% confidence. On average, 5% do not include 4. 

In practice, you would only construct one interval. Remember that there is a 5% chance that 
your interval does not include yu. 


r The intervals shown in bold are the ones which do 
SS not include y. You will see that in this case just six 
of the 100 do not include uw. On average 95% of 
intervals constructed in this way will include the 
——— <——_eeand true population mean. 


Critical z-values in confidence intervals 


The z-value in the confidence interval is known as the critical value and is obtained for 
different levels of confidence as follows: 


In a 90% confidence interval, 
the upper tail probability is 0.05 
so the lower tail probability is 0.95. 


PIZ <2) =0.95 
ie. O(2)=0.95 ns 
z= 0710.95) -1.645 0 1.645 
= 1.645 


ATION 453 


In a 95% confidence interval, 
the upper tail probability is 0.025 
so the lower tail probability is 0.975. 


P(Z<z)=0.975 
ie. O(z)=0.975 
z=07(0,975) 
21.96 


In a 99% confidence interval, 
the upper tail probability is 0.005 
so the lower tail probability is 0.995. 
P(Z<z)=0.995 
ie, (2) =0.995 
z=0-1(0,995) 
= 2.576 


Summary 


. symmetric 90% confidence interva 


% confidence interval for a is 


yy 


Table of critical values 


In some tables, the most commonly used critical z-values are summarised as follows: 


eof p, the ¢ es the value of z such that] 


L 


p 0.75 0.90 0.95 0.975 0.99 0.995 0.9975 0.999 0.9995 


z 0.674 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291 


— for a 90% confidence interval, P(Z < z) = 0.95; p = 0.95 gives z= 1.645, 
— for a 95% confidence interval, P(Z < z) = 0.975; p = 0.975 gives z= 1.96, 
~ for a 99% confidence interval, P(Z < z) = 0.995; p = 0.995 gives z= 2.576. 


If you want a 98% confidence interval, this implies an upper tail probability of 0.01. 
PZ < z) = 0.99 gives ¢=2.326. 


If the p-value for the confidence interval that you require is not in this summary table, you 
will need to work from the main body of the normal distribution table. 


(b) Confidence interval for «, the population mean 
@ of a non-normal population, 
e with a known variance o7 
e using a large sample, n > 30 say 


In this case, since the sample size is large, the central limit theorem can be used. 
Bes : a o 
X is approximately normal and X ~ fu ] (see page 442). 
n 
Lf % is the mean of a random sample of size #, where 7 is large (2 > 30), taken from a non- 


normal population with known variance o°, 
then a 95% confidence interval for yu is given by 


Co oO \ 
==, ¥+ 1.96 — 
i nl 


f 
[x 1.96 
4 Vin 


Example 9.20 


The heights of men in a particular district are distributed with mean cm and the standard 
deviation o cm. 

On the basis of the results obtained from a random sample of 100 men from the district, the 
95% confidence interval for 4 was calculated and found to be (177.22 cm, 179.18 cm). 


Calculate 

(a) the value of the sample mean, 

(b) the value of o, 

(c) a symmetric 90% confidence interval for wu. 


Solution 9.20 


Let X be the height, in centimetres, of a man in the district. 
The distribution of X is not known, but E(X) = and Var(X) = 07. 


Since the sample size, 7, is large (7 = 100), using the central limit theorem, the distribution of 
2 


= . : Ss 
X is approximately normal with mean wu and variance —. 

n 
(a) A 95% confidence interval for u is given by 


oO oO 
,%+1.96 =| with n= 100, so ¥n=10 
Vn Vn 


Since the interval is (177.22, 179.18) 
%- 1.96 =177.22 ... © 
10 


*-1.96 


oO 
#4 1.96 =179.18 ... @ 
* 10 


Adding © and ® 2% = 356.4 
®= 178.2 


The sample mean is 178.2 cm. 


(b) Subtracting: @ - ® 2x 1,96 = 1.96 
o=5 


(c) A symmetric 90% confidence interval for x is 


o o 
X— 1.645 —,x%+ 1.645 — 
Tn Vn 

o 5 

% + 1.645 —=178.2 + 1.645 x —_ 

Vn 10 


= 178.2 + 0.8225 
So the 90% confidence interval = (178.2 — 0.8225, 178.2 + 0.8225) 
= (177.38 em, 179.02 cm) (2 d.p.) 


Example 9.21 


A plant produces steel sheets whose weights are known to be normally distributed with a 
standard deviation of 2.4 kg. A random sample of 36 sheets had a mean weight of 31.4 kg. 
Find 99% confidence limits for the population mean. (L) 


Solution 9.21 
X is the weight, in kilograms, of a steel sheet. Then X ~ N(u, 2.47). 
A sample of size 36 is taken, so n = 36 and the sample mean ¥ = 31.4. 


The end-values of a 99% confidence interval for « are 


o 2.4 
% + 2.576 —=31.4 + 2.576 x — 
Vn 36 

= 31.4 4 1.0304 


so the 99% confidence interval is (31.4 — 1.0304, 31.4 + 1.0304) = (30.3696, 32.4304) 
= (30.4 kg, 32.4 kg) (3 s.f.) 


Width of a confidence interval 


In Example 9.21, 
= ae er 
width of the 99% confidence interval = 2 x 2.576 x 2 BOBO eH AE BBIE TT 
n ' 
= 2.0608 kg. co aE = 
* 2.876 


For the same data __ pee. : ae 
The width of the 95% confidence interval * vn Vn 
1,96 2 ‘ width 
= 96 — é 7 
on Vn 2x 1.96 
2.4 
=2x 1.96 x — 
V36 


= 1.568 kg 


For a given sample size 1, 


the greater the level of confidence, the wider the confidence interval. 
Determination of sample size 


Example 9.22 


i istributed random variable with mean 
The result X of a stress test is known to be a normally distri ; a 
wand standard deviation 1.3. It is required to have a 95% symmetrical confidence interval for 
# with total width less than 2. Find the least number of tests that should be carried out to i 
achieve this. 


Solution 9.22 


1.3? 


X ~ Niu, 1.37), and for samples of size n, X ~ N{u | 


The 95% confidence interval for yu is 


X~ 1.96 


Ge 


o 
Interval width = 2 x 1.96 — 


a 


1.3 
=2x1.96x—= wD 
Vn 
_ 5.096 


Vn 
The width of the interval must be less than 2, 


_ 9.096 


oR 4+1.96 4 


Pays ® 


nh 
5.096 
eee 


Vin > 2.548 
n > 2,548? 
ie. n> 6.49... 


The least number of tests that should be carried out is 7. 


Now suppose that, in Example 9.22, the 95% confidence interval for « must have a width less 
than 1. Will the sample size 7 be larger or smaller than when the total width was less than 2? 


To answer this, look at equation ®. This now becomes 
5.096 


Va 
ie. Va > 5.096 
n> 25,96 


< 


So the least number of tests would be 26. The sample size has increased. 
For a given confidence level, 


the smaller the interval width, the larger the sample size requi 


Now consider the situation as in Example 9.22, where the total width must be less than 2, but 
the confidence level is increased to 99%. 


Will the sample size 7 be larger or smaller than that required for the 95% confidence interval? 


The calculations needed to find the width are similar to those given in Solution 9.22, 


equation © but the value 1.96 will be replaced by 2.576, the z-value for the 99% confidence 
interval. 


1.3 
So interval width = 2 x 2.576 x —— 


_ 6.6976 
Va 
6.6976 _, 


Therefore 


n 
6.6976 
Vn > 


n>11.21... 
For the 99% confidence interval, the least number of tests required is 12, whereas for the 
95% confidence interval it was 7, so the sample size must be larger. - 


For a given interval width, 


er the level of confidence, the larger the sample size required 
er Tae leyei of contd ence, the iarger tie SAMpie Size Pequay Laer 


(c) Confidence interval for #, the population mean 
@ of a normal or non-normal population, 
e with unknown variance o2 
© using a large sample, n 


When calculating confidence intervals it is often the case that the population variance, 0, is 
not known. Provided that the sample size, #, is large, (12 > 30 say) it is permissible to use 67, 
the best unbiased estimate for o2 (see page 447), 


Ideally the distribution of X should be normal, but an approximate confidence interval can 


also be given when the distribution of X is not normal. Remember that in both cases, 2 must 
be large. 


Provided that 1 is large, (#1 > 30 say), 
4 95% confidence interval for wis 


! BS é 


i apf Pare 
[B= 1.96, 844,96 = 
\ Va Na} 
43h it 
where 6° = —$* s 
nod 
j i [ (ey 
or oe: ISxt-- 
neil 
Example 9.23 
50 cars chosen at 


The fuel consumption of a new model of car is being tested. In one trial, 
random, were driven under identical conditions and the distances, « km, covered on 1 litre of 


petrol were recorded. The results gave the following totals: 


Y= 525, Lx? =5625. 


% confidence interval for the mean petrol consumption, in kilometres per litre, 


Calculate a 95 
of cars of this type. 


Solution 9.23 


2 
2% 525 10,5 
n 50 
1 2 
o2 is unknown, so use 6” where = bs x= a 
n-1 n 
1 5257 
elses 
49 (s ae 50 ) 
= 2.2959... 


95% confidence limits for u are 


pede eteewieee 
eae an 50 
= 10.5 +0.42 


95% confidence interval for # = (10.08 km/litre, 10.92 km/litre) 


Example 9.24 
The height, x cm, of each man in a random s 
sured. The following results were obtained: 


ample of 200 men living in the UK was 


meas 
Yx = 35 050, Yx? = 6 163 109. 
(a) Calculate unbiased estimates of the mean and va 
the UK. 
confidence interval for the mean height of 


(b) Determine an approximate 90% 
the UK. Name the theorem that you ha 


riance of the heights of men living in 


men living in 
(N 


ve assumed. 


EAB) 


Solution 9.24 


. . 4x 35050 
(a) 4=X=—= = 
A 5gg 17528 
P= 5? 
il 
n [Xx? 
aes fre 
al n >| 
_ 200 (6 163 109 
co ae 175.254) 
= 103.5 
Alternatively 
gre [: apie oy > 
n-1 n se 
_1 35 0502 
j99 (5163 109- x00 


= 103.5 
(b) The confidence limits for 90% confidence interval for pz are 


X + 1.645 Ee 175.28 eapye 
e 200 


=175.25 + 1.1833 


So 90% confidence interval is (1 
75,25 - 1.1833 ..., 17 
= (174.07 cm, 176.43 cm) (2d.p.) ees 


The central limit 
theorem has been us: i 
ed i istributi X 
. to give an approximate distribution for X, the 


sample mean where X ~ N (. —}. 
n 


1, The concentrations, in milligrams per litre, of a 
trace element in 7 randomly chosen samples of 
water from a spring were 


240.8, 237.3, 236.7, 236.6, 234,2, 233.9, 232.5. 
Determine the unbiased estimates of the mean 


and the variance of the concentration of the trace 
element per litre of water from the spring. (L) 


2, Find the best unbiased estimates of the mean 
and variance o? of the population from which 
each of the following samples is drawn, It is a 
good idea to do parts (a) to (c) both with and 
without a calculator. 


(a) 46, 48, Si, 50, 45, 53, 50, 48 


(b) 1.684, 1.691, 1.687, 1.688, 1.689, 
1.688, 1.690, 1.693, 1.685 


(d) Yx=120, Ex?=2102, n=8 
(ec) Yx=100, La2=1028, n= 10 
(f) 2=34, Sx=330, Lx?=23 700 


3. A measuring rule was used to measure the length 
of a rod of stated tength 1 m, On 8 successive 
occasions the following results, in millimetres, 
were obtained. 


1000, 999, 999, 1002, 1001, 1000, 1002, 1001. 


Calculate unbiased estimates of the mean and, to 
two significant figures, the variance of the errors 
occurring when the rule is used for measuring a 

1 m length. {L) 


4. Cartons of orange are filled by a machine. A 
sample of 10 cartons selected at random from 
the production contained the following 
quantities (in millilitres) 

201.2 205.0 209.1 202.3 204.6 
206.4 210.1 201.9 203.7 207.3 


Calculate unbiased estimates of the mean and 
variance of the population from which the 
sample was taken. (L) 


5. Acertain type of tennis ball is known to have a 
height of bounce which is normally distributed 
with standard deviation 2 cm. A sample of 60 
tennis balls is tested and the mean height of 
bounce of the sample is 140 cm. 


(a) Find a 95% confidence interval for the mean 
height of bounce of this type of tennis ball. 

(b) State any assumptions made in calculating 
yout interval. 


6. Arandom sample of 6 items taken from a 
normal population with mean y and variance 
4.5 cm? gave the following data: 

Sample values: 12.9 cm, 13.2 cm, 14.6 cm, 
12.6 cm, 11.3 cm, 10.1 cm. 


{a) Find the 95% confidence interval for p. 
(b) What is the width of this confidence interval? 


7. A factory produces cans of meat whose masses 
are normally distributed with standard deviation 
18 g. A random sample of 25 cans is found to 
have a mean mass of 458 g. 


{a) Obtain the 99% confidence interval for the 
population mean mass of a can of meat 
produced at the factory. 

(b) Explain what the interval means. 

(c) Would the interval be wider if a 90% 
confidence interval was calculated? 
Explain your reasoning. 


8. Arandom sample of 100 observations from a 
normal population with mean # gave the 
following data: Lx = 8200, Yx? = 686 800. 


(a) Find a 98% confidence interval for y. 

(b) Find a 99% confidence interval for x. 

{c) Would your answers have been different if 
the population was not normal? 
Explain your answer. 


9. Eighty employees at an insurance company were 
asked to measure their pulse rates when they 
woke up in the morning. The researcher then 
calculated the mean and the standard deviation 
of the sample and found these to be 69 beats and 
4 beats respectively. Calculate a 97% confidence 
interval for the mean pulse rate of all the 
employees at the company, stating any 
assumptions that you have made. 


10. One hundred and fifty bags of flour are taken 
from a production line and found to have a mean 
mass of 748 g and standard deviation of 3.6 g. 


(a) Calculate an unbiased estimate of the 
standard deviation of a bag of flour 
produced on this production line. 

(b) Calculate a 98% confidence interval for the 
mean mass of a bag of flour produced on 
this production line. 

(c} State any assumptions you have made. 


11. (a) A 95% confidence interval for the mean 
length of life of a particular brand of light 


bulb was calculated and the confidence 
nd 1101.7 hours: 


he results of @ 
lbs. Find the 
a, the mean 
f light bulb. 


limits were 1023.3 hours a! 
The interval was based on 
random sample of 36 light b 
99% confidence interval for 
length of life of this brand o} 


(b) Forty random samples of 36 light bulbs are 
taken and a 90% confidence interval for u is 
calculated for each sample. Find the 
expected number of intervals that contain g. 


12. An efficiency expert wishes to determine the 
mean time taken to drill a number of holes in a 
metal sheet. Determine how large a random 
sample is needed so that the expert can be 95% 
certain that the sample mean will differ from the 
true mean time by less than 15 seconds. Assume 
that it is known from previous studies that the 
population standard deviation is 40 seconds. (L) 


13. A random sample of 60 loaves is taken from a 
population whose masses are normally 


cps with mean yu and standard deviation 

g 

(a) Calculate the width of a symmetric 
95% confidence interval for 4 based on this 
sample. 

(b) Find the confidence level of a i 

‘ 1 symmetric 

95% confidence interval having the same 
width as before but based on a random 
sample of 40 loaves. 


14: The distribution of measurements of thicknesses 
of a random sample of yarns produced in a 
textile mill is shown in the following table. 


Yarn thickness in microns 
{mid-interval values} Frequency 
72.5 6 
77.5 18 
82.5 32 
87.5 57 
92.5 102 
97.5 S1 
102.5 25 
107.5 9 


Illustrate these data on a histogram. 

Estimate, to two decimal places, the mean and 
standard deviation of yarn thickness. Hence 
estimate the standard error of the mean to two 
decimal places, and use it to determine 
approximate symmetric 95% confidence limits. 
giving your answer to one decimal place. (MEL ) 


15. The age, X, in years at last birthday, of 250 
mothers when their first child was born is given 
in the following table: 


18— 14 

20~ 36 

22- 42 

24- 57 

26- 48 

28- 26 

30- 17 

32- 7 

34- 2; 

36- 0 

38— 1 
(The notation implies that, for example in row 1, 
there are 14 mothers for whom the continuous 
variable X satisfies 18 < X < 20.) 


Calculate, to the nearest 0.1 of a year, estimates 
of the mean and the standard deviation of X. 


If the 250 mothers are a random sample from a 
large population of mothers, find 95% 
confidence limits for the mean age, 4, of the total 
population. (C) 


16. The lifetimes of 200 electrical components were 
recorded to the nearest hour and classified in the 
frequency tabulation. 


Lifetime Frequency Lifetime Frequency 


0- 80 600~ 4 
100- 48 700- 3 
200- 30. 800- 2 
300- 18 900- 0 
400- 10° 4000= 0 
500- 5 


Draw a histogram of the data and estimate the 
mean and standard deviation of the distribution. 
Calculate a symmetric 90% confidence interval 
for the population means, using a suitable 
normal approximation for the distribution of the 
sample mean, (MEI) 


(d) Confidence interval for « when 


e the population is normal 
e o7 is unknown, 
@ sample size n is small, 


When calculating confidence intervals, you have already encountered the situation when large 
samples (7 > 30) are taken from a normal population with unknown variance o°. 


For large samples, 


X= og sphere Z ~ N(O,1) 
Nn where , 
X- ee 
But if the sample size is small (# < 30), pistes no longer has a normal distribution. 
6Nn 
For small samples, 
be 
4 a T where T has a ¢-distribution. 
6Nn 


Before looking at confidence intervals « when the sample size is small, consider further the 
t-distribution. 


THE t-DISTRIBUTION 


The distribution of T is a member of a family of t-distributions. All ¢-distributions are 
symmetric about zero and have a single parameter v (pronounced new) which is a positive 
integer. 

v is known as the number of degrees of freedom of the distribution and if, for example, T has 
a t-distribution with five degrees of freedom, you would write T ~ ¢(5). 

The diagram below shows two curves, ¢(2) and 25). 

Note that as v increases, the corresponding #(v) curve resembles the standardised normal 
distribution N(0, 1). In fact when v > 30, the difference between the ¢(v) distribution and the 
normal distribution is negligible. 

For samples of size n, it can be shown that 


Standardised normal curve 


X-u 
6NNn 


T= follows a ¢-distribution with (2 — 1) degrees of freedom. 


For example, for a sample of size 8, 
T follows a ¢-distribution with 7 degrees of freedom. You would write T ~ #(7). 


The 95% confidence interval for # is obtained as follows: 


fx: 2 are the mean - Agha a i Z an) ¢ 

u a and s* are the mean and variance of a small sample (#7 < 30) from a normal population 
with unknown mean w and unknown variance o°. 

then a 95% confidence interval for is given by 


| where 6 3. 


Vie] 


oe . ithe 
ang tis the value from a d(# ~ 1) distribution such that P{- 


<T<t) = 0.95, 
ie. (-#, 2) encloses 95% of the ta ~ 1) distribution. 
To find the required value of t, known as the critical value, you will need to use ¢-distribution 


tables. These give the t-value such that P(T <1?) = p, for various values of . The tables are 
printed on page 650 and an extract is reproduced here. 
\ PUTst) =p 
1 


= 75 0.90 0.95 0.975 0.99 0.995 | 0.9975 0,999 9.9995 ye 
Hl 


1.000 3.078 6.314 12.71 31.82 63.66 127.3 318.3 636.6 | 
0.816 1.886 2.920 4.303 6.965 9.925 14.09 22,33 31.60 Fr 
0.765 1.638 2.353 3.182 4.541 5.841 7.453 10.21 12.92 
0.741 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.610 


0.727 1.476 2.015 2.571 3.365 4.032 4.773 5.893 6.869 
0.718 1.440 1.943 2.447 3,143 3.707 4.317 5.208 5.959 
0.711 1.415 1.895 2.365 2.998 3,499 4.029 4,785 5.408 
0.706 1.397 1.860 2.306 2.896 3.355 3.833 4.501 5.041 i 
0.703 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781 


0.700 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587 
0.697 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.437 


0.683 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.646 
0.681 1.303 1.684 2.021 2.423 2.704 2.971 3.307 3,551 
0.679 A296 1.671 2.000 2.390 2.660 2.915 3.232 3.460 
0.677 yo 1.289 1.658 1.980 2.358 2.617 2.860 3.160 3,373 
0.674 / 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291 


a will see that as v increases, the corresponding t-distribution becomes more and more like 
Pe normal distribution. Compare the last row, v = ce, with the critical values for the normal 
Istribution, printed on page 649, 


You will find that you use the #-distribution tables in a slightly different way from the normal 
tables, so you need to ensure that you can use them correctly. 
In this extract, the highlighted values are referred to in the text. 
p O78 1,90 og 0.975 0.99 0.998 O.9975 0. 
ve | 1.000 3.078 6.314 12.71 31.82 63.66 127.3 318.3 636.6 
2 0.816 1.886 2.920 4,303 6.965 9.925 14.09 22.33 31.60 
10 0.700 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587 
(a) ll 0.697 1.363 1,796 2.201 2.718 3.106 3.497 4.025 4.437 
12 0.695 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.318 
13 0.694 1.350 1.771 2.160 2.650 3.012 3.372 3.852 4.221 
(b) I4 0.692 1.345 1.761 2.145 (2.62.4 2.977 3.326 3.787 4.140 


Example 9.25(a) 
Consider T following a t-distribution with 11 degrees of freedom, ie. y= 11 and T~ (11). 
Find (i) P(T< 1.796) 


(ii) P(T'> 3.106) 
(iii) P({T|< 2.201) 


Solution 9.25(a) 


(i) v= 11, so find row 11 and go across to 1.796, then i 

up to the top of the column. This gives 0.95. I 
P(T < 1.796) = 0.95 = 95% 0.95 \ 

| 


AN 
2 ! fe 
0 1.796 
(ii) Find row 11, go across to 3.106, which is in column 0.995. f 
So P(T < 3.106) =0.995 1 \ 0.995 
P(T > 3.106) =1-0.995 = 0.005 i 
i.e. P(T > 3.106) = 0.5% | 0,005 
oneal iz 
o 3.106 


(iii) You need P(| T| < 2.201) ie. P(-2.201<T< 2.201) 
Find row 11, go across to 2.201 which is in column 0.975. tN 0.978 


So P(T < 2.201) = 0.975 
P(T > 2.201) = 1- 0.975 = 0.025 


It follows that 7 — 
2.201 
P(T <~2.201) = 0.025 also ™ 
P(-2.201 < T< 2.201) = 1 - 0.025 - 0.025 Vi 
= 0.95 A 


The t-values that enclose the central 95% are +2.201. a 


1 
I 
=95% ay ree Ko 
- ii 
+ Ke 
ie} 


Example 9.25(b) 


The random variable T has a ¢-distributi i i 
b AScashar Ald pdaken ribution with 14 degrees of freedom, i.e. T ~ ¢(14). 


(i) P(T<t) =0.90 (ii) P(| T| <t)=0.98 


Solution 9.25(b) 


(i) v= 14, row 14, column 0.90 gives t= 1.345. 

(ii) The required value for t corresponds to an upper tail 
probability of 0.01, so the p-value must be 0.99. 
Row 14, column 0.99 gives # = 2.624. 


Critical t-values for a 95% confidence interval 


For a 95% confidence interval, you want ¢ such that oT 


P(|T | <t) = 0.95, ie. P(-t<T<t) = 0.95. Sf 6 
This corresponds to an upper tail probability of 0.025 / s 
so P(T < t} = 0.975. : 0.025 ; \ 0.028 
watt 1 we 
+ tec 
0 


0.75 0.90 6.98 O.975 


So if T ~ ¢(9) for example, 
P(T <t)=0.975 givest= 2.262 


and the critical values of t for the 95% 
confidence interval are +2.262 


1 1.000 3.078 6.314 12.71 
2 0.816 1.886 2.920 4.303 
3 0.765 1.638 2.353 3.182 
4 0.741 1.533 2.132 2:776 
$ 0.727 1.476 2.015 2.571 
6 0.718 1,440 1.943 2.447 
7 0.711 1.415 1.895 2.365 
8 0.706 1.397 1.860 2.306 
2 0.703 1.383 1.833 2:262 


NOTE: This value will be used in Fxample 9.26. 


To find the critical values of t = 
5% i -¢ interval, look under column 0.975. 
for a 95% confidence interval, kK 


— : 90% confidence interval, look under column 0.95, 

for a 98% confidence interval, look under column 0.99, 

for a. 99% confidence interval, look under column 0.995. 
The following examples illustrate how to calculate confidence intervals when critical values 
are found using a ¢-distribution. 


Example 9.26 


The mass, in grams, of a packet of biscuits of a particular brand, follows ae 
distribution with mean j. Ten packets of biscuits are chosen at random and their 


noted. The results, in grams, are 
397.3, 399.6, 401.0, 392.9, 396.8, 400.0, 397.6, 392.1, 400.8, 400.6 


These can be summarised as follows: Lx = 3978.8, Vix? = 1 583 098.3. 
Calculate a 95% confidence interval for 4. 


Solution 9.26 


X is the mass, in grams, of a packet of biscuits. 
X ~ Nu, 0”) with both uw and o* unknown. 


Since o? is unknown, find 6? (see page 447) 
1 
nmi 


2 
é= [ne with 7 = 10 
n 


4 3978.7? 
~() 583 098.3 - 
= 10,325... 
6=3.213... 
Sx 3978.7 


The sample mean, x =I a0 = 397,87. 


Since # is small, a ¢(2 — 1) distribution is required. 
n=10, so use a #(9) distribution. 


The 95% confidence limits for j are 


6 
K+ te where (—z, t) enclose 95% of the 4(9) distribution. 
n 


From tables, as illustrated on page 466, the critical value # is 2.262 


3.213... 


V10 
= 397.87 + 2.298... 


-. confidence limits are 397.87 + 2.262 x 


95% confidence interval for uw = (397.87 — 2.298 vee) 397.87 + 2.298 ...) 
= (395.6 g, 400.3 g) (1 dp.) 


Note that in Example 9.27, the value of 6 can be obtained directly by using the calculator in 
standard deviation mode. Look for the key [%, (see page 448). 


On-4 


You should practise finding 6 using the data of Example 9.27. 


Example 9.27 


A student, studying the height of a particular plant, knows that it follows a normal 
distribution with mean u and variance 07, but he does not know the value of either of these 
parameters. He selects 15 plants at random, measures their heights and calculates that the 
mean height of the sample is ‘12.2 cm and the standard deviation is 1.4 cm, Using these values, 
calculate a 90% confidence-interval for sz. Calculate also the width of this interval. 


Solution 9.27 


X is the height, in centimetres, of a plant, where X ~ N(u, 0?) 
Sample values: X = 12.2 and s = 1.4 where s is the standard deviation. 


=? withn=15 
n-1 


=Bx14=1.5 
6=VL5=1.22... 


Since 2 = 15, the #(14) distribution is considered. 


B 0.75 0.90 0.95 


1.000 3.078 6.314 
0.816 1.886 2.920 


For a symmetric 90% confidence interval, 
column p = 0.95 is required in order to find the 
critical value of t, : 

When v = 14, t= 1.761, 

so (~1.761, 1.761) encloses the central 90% of 
the ¢(14) distribution. 


12 0.695 1.356 1.782 
13 0.694 1.350 1.771 
4 0.692 1.345 1.761 


(Extract from tables on page 650) 


468 


469 


The 90% confidence limits for # are 


é i 
R £t—== 12.2 + 1.761 
Vi5 


n 
=12.24+0.556... 


90% confidence interval = (12.2 - 0.556 ..., 12.2 + 0.556...) 


= (11.64 cm, 12.76 cm) (2 dp.) 


Width of interval =2 x 0.556... 


=11icm (2 dp.) 


1. The heights, in metres, of a random sample of 6 


policemen from a particular station were as 
follows: 

1.80, 1.76, 1.79, 1.81, 1.83, 1.79. 
Assuming that the heights of policemen from 
that station are normally distributed with 
mean ft, 

(a) calculate a 95% confidence interval for yt, 
{b) state the width of this interval. 


. A sample of 8 independent observations ofa 
normally distributed variable gave the following 
values: 

3.6, 3.9, 4.5, 3.8, 4.4, 4.9, 4.2, 3.8. 


(a) Determine a 99% confidence interval for the 
population mean 4. 

(b) Find the difference between the widths of a 
90% confidence interval for 4 and a 95% 
confidence interval for 4. 


‘Twenty measurements of x, the life, in hours, of 
a particular make of candle gave the following 
data: 

Yx=172, Yx?= 1495.5. 

Assuming that the length of life is modelled by a 


normal distribution with mean 4, find a 98% 
confidence interval for 4. 


. Arandom sample of 8 independent observations 
of a normal variable gave 


Yxe=261.2, L(e— x)? =3.22. 


Calculate a 95% confidence interval for the 
population mean. 

Jé 400 such samples were taken, how many of 
these would you expect not to include the 
population mean? 


5. Arandom sample of 7 independent observations 
of a normal variable gave 


x= 35.9, Ix? = 186.19. 
Calculate 


(a) an unbiased estimate of the population 
mean, 

(b) an unbiased estimate of the population 
standard deviation, 

{c) a 90% confidence interval for the 
population mean. 


6. The masses, in grams, of 13 washers selected 
from a production line at random are: 
15.4, 15.2, 14.6, 16.1, 14.8, 
15.3, 15.9, 16.0, 15.4, 14.6, 
15.0, 15.5, 16.1. 
Calculate 98% confidence limits for the mean 
mass of the washers on this particular 


production line, assuming that the mass can be 
modelled by a normal distribution. 


7. Fifteen pupils performed experiments to find the 
value of g, the acceleration due to gravity. Their 
results were as follows: 

9.806, 9.807, 9.810, 9.802, 9.805, 
9,806, 9.804, 9.811, 9.801, 9.804, 
9,805, 9.808, 9.803, 9.809, 9.807. 


Assuming that these are taken from a normal 
population, calculate 95 % confidence limits for 


the value of g based on these results. 


| 
| 
j 
| 
| 
i 


CONFIDENCE INTERVALS FOR THE POPULATION PROPORTION p 


Imagine tha’ Wi i on oO: Cesses ar populatio 
‘ou want to find Dp the proporti 
3 , t , in succes: i i ation. 
y' FY Pp iM a particu pop i oO 
obtain an idea of its value, you cou. d take a random sample of size n and ca th 
culate p,, the 


roportion of successes in your sample. This would give the best unbiased estimate p wher 
Pp y Pp: © p,where 


p =p, (see page 447). You could i 
s : also use th ‘ : 
known as a confidence interval for p. © this value of p, to obtain an interval estimate of Pp; 


The theory needed to derive the confidence interval for 


of proportions, P., described on page 445. P is based on the sampling distribution 


This states that, provided the sample size x is large, (n > 30) 


> 


the distribution of P, is normal, so P, ~ fp a where q=1-p 
a : 


Th iati ing di | 
e standard deviation of the sampling distribution of proportions. pq is needed in th 
> e 
n 


calculati Bia : ‘ 
ulation of the limits for the confidence interval, The difficulty is, however. 
. > 


a : : F 
isn’t known, since p isn’t known! that its value 


distribution is approximately Peds. 
n 


You are then able to find approximate confidence intervals for p as follows: 


To overcome thi = iting 
me this, use P= p,. Writing 1 — p, as q,, the standard deviation of the sampling 


Confidence limits 


90% p. 1.645. {Peds 
n 


Confidence interval Width 


_ Dd 
> 1.645 PS, p+ 1.645 ion 2% 1.645: [Peds 
n 
95% Ped 
2 py = 1.96 Pe [o.-136 /P, 0.41.96 [Pa 
n 
n 


99% p, #2.576 | Pete Dede 
i P,-2.576 JS, p, + 1.96 a Subeoe (om 
‘ n 


Remember i 2 ; 
; that the sample SIZe, 1, should be large (n> 30 say), since the normal 
approximation to the binomial dis ribution is in o inir € b ot sample 
: . used in obtaini: he distri i 
: Hon ; g istribution of 'p. 
proportions, Also, since a continuous distribution has beer used as an approximation fo 
ra 


discrete distributi inui 
listribution, continuit i 
y corrections should be used. Th i 

; . The 
however, when calculating confidence intervals a 


2* 1.96 [Pd 


Example 9.28 


A 

manufacturer wants to assess the 
a particular machine. 
defective, 


proportion of defective items in a lar; 
ge batch prod 
He tests a random sample of 300 items and finds that 45 oar id 


(a) Calculate an approximate 95 
the batch. 


(b) If 200 
ae ee hae are performed and a 95% confidence interval calculated for each, how 
you expect to include the proportion of defective items in the batch? : 


% confidence i i 
interval for the proportion of defective items in 


A470. A COP 


Solution 9.28 


4S - 
= =0.15, q,=1-p, = 0.85, n= 300. 
(a) Ps= 359 q 
The 95% confidence limits for p are 
pa, g_ [0-15 0.85 
p,# 1.96 |= 0.15 £19 _ 


= 0.15 + 0.0404 


95% confidence interval = (0.15 — ea + 0.0404) 
= (0.1096, 0. 2 ; ae be oe le 
i f defective items in the bate! 
d ber of tests that include the proportion o 
(b) The expected numbe: ee ay 


Example 9.29 
In a random sample of 400 carpet shops, it was discovered that 136 of them sold carpets at 
below the list prices recommended by the manufacturer. 
i ntage of all carpet shops selling below list price. 
a eins Coats 90% confidence interval for the proportion of shops that sell 


i i in bri this means. a 
w list price and explain briefly what , 
(c) wan size are would have to be taken in order to estimate the percentage to withi 


+2%, with 90% confidence? 


Solution 9.29 


136 
(a) Pp, = 4007 0.34 ae 
‘An estimate of p, the proportion of all carpet shops selling paca list price, is 
fp =p, = 0.34. So an estimate of the percentage of shops is 34%. 


(b) An approximate 90% confidence interval for p is given by 


Pode ng 5 iene oe 
p, + 1.645 oe ee 400 
= 0.34 + 0.039 
= (0.301, 0.379) 


= (30.1%, 37.9%) 


bability that the interval (30.1%, 37.9%) ; 5 
a 90. Ifa rm number of intervals are calculated in the same way, 90% of the 


include the true percentage. 


includes the true population percentage 
m would 


(c) In part (b) the percentage of shops was estimated to within +3.9%, 


You now require 7 such that the percentage is to within +2%, 


so that p, + 1.645 Pita. + 0.02 
n 


Taking the + sign on both sides 


p+ 1.645 {P24 _ 4 40.02 
an 
1.645 {P4s_ 9.92 
n 
0.34 x 0.66 


ie 1.645x {/———"" 9,02 
n 


1.645 x V0.34 x 0.66 
0.02 


= Va 


Vn = 38.96 ... 
n=1520 (3s.f.) 


1520 shops would have to be sampled. 


Exercise 9g Confidence intervals for p 


1, Ina survey of a random sample of 


250 households in a large city, 170 households 
owned at least one pet. 


(a) Find an approximate 95% confidence 
interval for the proportion of households in 
the city that own at least one pet. 

(b) Explain why the interval is approximate. 


. In order to assess the probability of a successful 


outcome, an experiment was performed 200 
times. The number of successful outcomes was 
72. 


(a) Find a 95% confidence interval for p, the 
probability of a successful outcome. 
(b) Find a 99% confidence interval for p. 


. A survey was undertaken of the use of the 


internet by residents in a large city and it was 
discovered that in a random sample of 150 
residents, 45 logged on to the internet at least 
once a day. 


(a) Calculate an approximate 90% confidence 
interval for p, the proportion of residents in 
the city that log on to the internet at least 
once a day. 

(b) One hundred similar surveys are carried out 
and the 90% confidence interval calculated 
for each survey. State the expected number 
of intervals that include p. 


. Recruits are issued with boots when they join the 


army. The last 50 pairs of boots issued were the 
following sizes: 


8 9 8 10 11 8 7 12 42 9 
9 8 11 8 9 7 11 12 11 10 
9 10 10 10 8 8 7 12 9 9 
0137 8 9 9 10 10 8 12 
9 


9 10 10 11 12 9 9 10 9 


(a) Find the proportion in the sample requiring 
size 9, 

(b) Assuming that the last 50 recruits can be 
regarded as a random sample of all recruits, 
calculate an approximate 90% confidence 
interval for the proportion, p, of all recruits 
requiring size 9.boots. 

(c) Explain why the interval is approximate. 


» Ina market research survey, 25 people out of a 


random sample of 100 in a certain town said 
that they regularly used a particular brand of 
soap. Find approximate 97% confidence limits 
for the proportion of people in the town who 
regularly use this brand of soap. 


. A college principal decides to consult the 


students about a proposed change in the times of 
lectures. She finds that, out of a random sample 
of 80 students, 57 are in favour of the change. 


(a) Find an approximate 90% confidence 
interval for the proportion of students who 
are not in favour of the change. 

{b) State the effect on the width of such a 
confidence interval when the confidence 
level is increased. 


472 i ze 
| 
ini 0 le were interviewed (b} Estimate the additional number of families i z be tine 38 ae 
- ie aie eee chocolate to be contacted if the probability that the Distribution of the sample proportion: | 
to milk chocolate. aera pes a in on by a — If a number of random samples, each of the same size 2, is taken from a parent population 
mosi ‘+ . : 
(a) Calculate an approximate 95% confidence THAR UNTIED Dee : and the proportion of successes, p,, calculated for each sample, then these proportions : 
interval for the proportion of the population 9. The probability of success in each of a long series form a distribution called the sampling distribution of proportions, 
who prefer white chocolate. State any of m independent trials is constant and equal to i , : on ; 
‘s as ar tae ie = jhepenneden p. Explain how an sppeoniaute 95% confidence | Provided that 7 is large, the sampling distribution of proportions is approximately normal 
fe d/o CORMAENCE L171 e . i 
. A interval for p may be obtained. 
preferring white chocolate, based on a ae In an opinion poll carried out before a local | such that P, ~ N{p, bq where q=1-p. 
of size 300, ate'0.2278 and 0:2922;:Calcal i election, 501 people out of a random sample of i n 
(i) the proportion at P as ee the soe i 925 voters declare that they will vote fora ; ? | 
_., Of 900 who preferred white chocolate, particular one of the two candidates contesting ' is called the standard error of proportion. | 
(ii) the value of a. the election. Find approximate 95% confidence H n ) 
8. Th Its of howed that 3600 out of limits for the proportion of all voters in ire) - ee 
. i results OF a survey Ss: : : : i a interval estimates: 
10 000 families regularly purchased a specific this candidate | : : 
weekly magazine. Confidence interval for the population mean 
(a) Find approximate 95% confidence limits for - 
the proportion of families buying the i Conditions 95% confidence interval for 
Prop M 
magazine. ; 
Normal population ; 
i ~ with known variance o* 2-196 2241.96 2 
Summa ry - co size # large or small areal? mae PA | 
ele — sample mean % 
6 Point estimates: unbiased estimates for | Non-normal population 
— population mean’ uz &=X, the sample mean ~ with known variance o* ¥-196 2,241.96 = a 
population proportion p p= p,, the sample proportion i = see size n large (n > 30) Va td 
n : ~ sample mean ¥ i 
i i 2 es s? (sis the sample variance) ls 
~ population variance eT { P Nacho que ; ; | 
mo [Irom 5 - with unknown variance o” [ -1.96 an X+1,96 | le 
Reb” ~ sample size 7 large (n > 30) a n i 
— sample mean % wheregie og 
1 yx? =) — sample variance s? n~ 
naa n ee pavalation , é , 6 
§ is given by [o,~ } on your calculator. ~ with unknown variance z +t 
ols given DY: | Oy. y : - sample size 2 small (n < 30) Vn Vn 
». Distribution of the sample mean : - reat mean & , where 62-7 . 2 
i i from a parent population i ~ sample variance s n- 
Jf a number of random samples, each of the same size 1, is:taken: from . s pop sited and (~2, #) encloses 95% of the 2(n ~ 1) distribution Hu 
and the mean x, is calculated for each sample, then these means form a distribution c: | 


the sampling distribution of means: Note that the width of the confidence interval can be reduced in one of these ways: 


— by increasing the sample size (making 1 larger), 


(a) when the samples are taken from ‘a normal population X ~ N(u, 07), with o? known, 


2 i 
: wi pL hag ¥ &: ] ~ by decreasing the percentage confidence (eg choosing a confidence | 1 of 90% instead nal 
then for samples of any size , the distribution X is also normal such that X ~.N ft i } of 95%), is Pi 8 (eg g a confidence level o: 4 instea A 
© 48 called the standard error of the mean. : — by reducing the size of the population variance if possible (making o smaller). | 


na 


2th Confidence interval for the population proportion p | 
(b): when the samples are taken from a non-normal population with known variance. a then aq 


Conditions 95% confidence interval for p 


for large values of n, the distribution X is approximately normal such that X ~ N ( “ 


: ~ sample size n large 

This is. known as the central limit theorem. : : : ~ sample proportion p, C es [ea ate fj 
; 4 i i es n n 

In both these casés; if the population variance, 07; is unknown, then 6? can be used instead, E 

provided that is large. 


| 

| 

where q, = 1 - p, 
at 

i 

: 

i 

; 


Miscellaneous Worked Examples 


Example 9.30 
Each of a random sample of 50 one-pound coins was weighed and their masses, x grams, are 
summarised by 
Yx =474.51, Lx? =4503.8276. 
(a) Use an unbiased estimate of variance to calculate an approximate 90% confidence interval 
for the mean mass (in grams) of all one-pound coins, giving the end-values of the interval 


cimal places. ; ; 
(b) Shino ie of a random sample of one-pound coins that would be required to give a 
95% confidence interval whose width is half that of the interval calculated in (a). 
(c) It was found later that the scales were consistently underweighing by 0.05 grams. State 
which of the results of (a) and (b) should be amended and which should not. Give the 


amended values. 


{C) 


Solution 9.30 
X is the mass, in grams of a one-pound coin. 


2 
wo oat (pe 22 
- n 


1 474.512 
=—[4503.8276 - 
49 


50 
= 0.01291... 


6=V0129... = 0.1136... 
go ct“O* 29.4902 


By the central limit theorem, and using 6 since o is unknown, 
90% confidence limits for w are 


6 0.1136... 
% + 1.645 —=9,4902 + 1.645 x —— 
n 
= 9.4902 + 0.0264... 


90% confidence interval = (9.46 g, 9.52 g) (2 d.p.). 


6 
(b) Width of 90% confidence interval = 2 x 1.645 — 


n 
=2x 0.0264... 
= 0.05287... 


Width of required interval =} x 0.05287... 
= 0.0264... 


AS 


z-value required for 95% confidence interval = 1.96 


0.1136... 
1.2% 1,96 x" = 0.0264... 
Vn 
0.445... 
ps see 
0.0264... 
Vn =16.85... 
n= 283.9... 


The sample size required is 280 (2 s.£.). 
(c) When the scales are underweighing by 0.05 g, 
— the confidence interval in part (a) would be amended. It would be shifted 0.05 units to 
the right. The new confidence interval would be (2.51, 9.57), 


— the confidence interval in part (b) would remain the same, since this uses the estimate of 
the variance which would not be altered if all the readings were increased by 0.05 g. 


Example 9.31 


Out of a random sample of 1000 French people interviewed during Autumn 1996, 410 
supported a single European Currency. 


{a) Calculate an approximate 99% confidence interval for the population proportion, p, of 
French people who supported a single European Currency. 


(b) Estimate the size of a sample that would have provided a 99% confidence interval of 
width 0.04 for p. 


(c) Give one reason (other than rounding) why your answer to (b) is only an estimate. (C) 


Solution 9.31 
410 
(a) Ps" 0007 0.41 and q,=1-p,= 0.59, 


In a sample of size 1000, the 99% confidence limits for p are 


P.4s 0.41 x 0.59 
+ 2.576 [#4041 4 2.576x [AE % 0S? 
Ps a * 16000 


= 0.41 + 0.4006... 
99% confidence interval = (0.37, 0.45) (2 s.f.). 


(b) For a width of 0.04, confidence interval would be 0.41 + 0,02 


ie. 2.576 [O4xO55_ 5 sie Gas 
n 


1.154... gee tna Rni, 
7 = 0.02 2 width = 0,04 
n 
1.154... 
“0.02 
= 57.73... 
n= 3332.8... 


Sample size = 3330 (3 s.f.). 


(c) The answer is only an estimate because the estimate for p, f = p,, was used to obtain an 


approximate value for the standard deviation of the sampling distribution Pq 
n 


Also in the sampling distribution of proportions (from which the confidence interval is 
obtained) a normal approximation is used for a binomial distribution. 


Example 9.32 


It may be assumed that the breaking strength of paving slabs laid in public areas is normally 
distributed with mean 50 units and standard deviation 8 units. Random samples of # paving 
slabs are taken. The mean breaking strength for a sample is denoted by X. 


(a) State the distribution of X, giving its mean and variance. 

(b) Find the probability that X exceeds 54 units in the case 7 = 25. 

(c) Find the smallest possible sample size if the probability that X exceeds 54 units is less 
than 0.01. 


Suppose that the breaking strength of paving slabs laid in public areas has mean SO units and 
standard deviation 8 units, but that the form of the distribution of breaking strengths is not 
known. Random samples of 7 paving slabs are taken. What can be said about the form of the 
distribution of the mean breaking strength of these samples in the case when x is large, and 
also in the case when 1 is small? (C) 


Solution 9.32 


X is the breaking strength, X ~ N(50, 87), 


= 8? 
(a) X ~ {50.5} 
s a eee 2. 2 64 
so X follows a normal distribution with mean 50 and variance ae 


= 8? 
(b) When n = 25, X ~ n{s0.35} 


ee: 
Standard deviation of X is $ 


= 54-50 
P(K > $4)=P/Z > 

crtians: | > Bis 
=P(Z > 2.5) 
= 0.0062 


(c) P(X > 54) < 0.01 


MON 477 


Nes eo = 2,326, so a z-value of 2.326 gives an upper tail probability of 0.01. 
an must lie to the right of 2.326, 
54-50 


8\n 


ie} 


> 2.326 


4> 2.326 x 4 
n \. 
Vi > 4.652... 
n>21.64... 


Smallest sample size is 22. 
When n is large, by the central limit theorem 
= 2 
X is approximately normal and X ~ foo, 4 
n 


When 1 is small, you cannot say what the distribution of X is. You only know that its mean is 
z 


50 and its variance is —. 
n 


Example 9.33 


The ‘reading age’ of children about to start secondary school is a measure of how good they 
are at reading and understanding printed text. A child’s reading age, measured in years, is 
denoted by the random variable X. The distribution of X is assumed to be N(u, 0”). The 
reading ages of a random sample of 20 children were measured, and the data obtained is 
summarised by Ux = 232.6, Vx? = 2756.22. 


(a) ener unbiased estimates of 4 and 07, giving your answers to correct to two decimal 
places. 


(b) Calculate a symmetric 95% confidence interval for pu. (C) 


Solution 9.33 


{a) AER ag ee 
2.1 a 
n-1 n 

1 232.67 
= 79 (278622- 20 

= 2.688 ... 
=2.69 (2 dp.) 
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(b) Since the population is normal, with variance unknown, and the sample size is small, use 


the ¢-distribution. 


6 
The 95% confidence limits for 4 are ¥ + f— 


where (—t, t) encloses the central 95% of the t(n — 1) distribution. 


Since 7 = 20, consider #(19). 


| 0.75 0.90 0.95 0.975 


vel 1.000 


17 0.689 1.333 1.740 2.110 
18 0.688 1.330 1.734 2.101 


19 0.688 1.328 1.729 2.093 


95% confidence limits are 11.63 + 2.093 x 
= 11.63 + 0.767... 


From tables, the critical t-value is 
found from column 0.975, v = 19 
and it is tf = 2.093. 


V2.688... 


20 


95% confidence interval for 4 = (11.63 — 0.767 ..., 11.63 + 0.767 ...) 
= (10.9 years, 12.4 years) (1 d.p.). 


Miscellaneous exercise 9h 


1, The mass of a certain brand of chocolate bar has 


a normal distribution with mean 4 grams and 
standard deviation 0.85 grams. The masses, in 
grams, of 5 randomly chosen bars are 


124.31, 125.14, 124.23, 125.41, 125.76. 


Calculate a symmetric 90% confidence interval 
for w, giving the end-points correct to two 
decimal places. 

Forty random samples of 5 bars are taken, and a 
90% confidence interval for # is calculated for 
each sample. Find the expected number of 
intervals that do not contain 4. (C) 


2. A telephone company selected a random sample 


of size 150 from those customers who had not 
paid their bills one month after they had been 
sent out. The mean amount owed by the 
customers in the sample was £97.50 and the 
standard deviation was £29.00. 

Calculate a 90% confidence interval for the 
mean amount owed by all customers who had 
not paid their bills one month after they had 
been sent out. (AEB) 


3. A catering company asked 50 randomly selected 
college students to state the amount of money, 
$x, which they spent daily on lunch, and the 
results were summarised by ¥x = 56.50 and 
Lx? = 66.80. Calculate unbiased estimates of the 
mean and the variance of the amount spent daily 
on lunch by students at the college, giving your 
answers correct to three significant figures. 
Hence find a symmetric 90% confidence interval 
for the mean amount spent daily on lunch, giving 
the end-points correct to the nearest $0.01. 
Justify the use of the normal distribution in 
constructing the confidence interval. (C) 


4. Arandom sample of 250 adult men undergoing a 
routine medical inspection had their heights 
(x cm) measured to the nearest centimetre, and 
the following data were obtained: 


Yx=43 205, Yx?=7 469 107. 


Calculate an unbiased estimate of the population 
variance. Calculate also a symmetric 99 Yo cl 
confidence interval for the population mean. 


- Arandom sample of 600 was chosen from the 


adults living in a town in order to investigate the 
number x of days of work lost through iliness. 
Before taking the sample it was decided that 
certain categories of people would be excluded 
from the analysis of the number of working days 
lost although they would not be excluded from 
the sample. In the sample 180 were found to be 
from these categories. For the remaining 420 
members of the sample: 


Xx= 1260 Yx*=46 000. 
(a) Estimate the mean number of days lost 


through illness, for the restricted population, 


and give a 95% confidence interval for the 
mean. 

{b) Estimate the percentage of people in the 
town who fall into the excluded categories, 
and give a 99% confidence interval for this 
percentage. 

{c) Give two examples, with reasons, of people 
who might fall into the excluded categories, 

(QO) 


. The proportion of bruised apricots in a large 


consignment is denoted by p. A sample of 100 
apricots is examined and 11 apricots are found 
to be bruised. 


(a) Give an assumption under which it would 
be valid to calculate an approximate 
confidence interval for p. 

(b) Given that the assumption in part (a) is 
justified, calculate an approximate 90% 
confidence interval for p. Give the end- 
points correct to two decimal places, (C) 


. The lifetimes of light bulbs of a certain type have 


standard deviation 25.3 hours. Each bulb in a 
randomly chosen box of 12 was tested to failure 
and the mean lifetime was found to be 1785.7 
hours. 


{a) State two assumptions which are required so 
that a symmetric 90% confidence interval 
for the population mean lifetime of the 
bulbs can be calculated. 

(b) Calculate a symmetric 90% confidence 
interval, given the validity of the 
assumptions. The values of the end-points 
should be given to the nearest integer.  (C) 


. A consumer group wishes to estimate the 


proportion, p, of packages of sausages whose fat 
content is greater than that stated on the label. A 
random sample of 40 packages was tested and 
nine packages were found to contain more fat 
than stated on the label. 


(a) Estimate the number of packages that would 
have to be tested in order that a 95% 
confidence interval for » should have a 
width of 0.1. 

{b) State, giving a reason, whether the number 
of packages to be tested would be larger or 
smaller than the answer in (a) if the 
confidence level were changed to 90%. (C) 


10. 


11, 
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. In June 1996, 150 randomly chosen people aged 


sixteen or more were asked whether they smoked 
cigarettes and 34 said that they did, Assuming 
that the responses were truthful, calculate an 
approximate 99% confidence interval for the 
population proportion of people aged sixteen or 
more who smoked cigarettes. 

Give a reason why this interval might not 
contain the true population proportion, (C) 


A certain type of yarn is known to have a 
breaking strength with a mean of 25 newtons, In 
an attempt to increase its breaking strength the 
yarn is treated with a chemical, Each piece of 
yarn in a random sample of 80 treated pieces has 
its breaking strength, x newtons, measured, 
producing the following summarised data: 


x= 2122 Vx? = 56 384 


(a) Obtain unbiased estimates of the mean, {4, 
and variance 0”, of the breaking strengths of 
pieces of yarn treated with the chemical. 

(b) Construct a symmetric 99% confidence 
interval for yu. 

(c) Hence state, with a reason, whether or not 
the manufacturer of the yarn is justified in 
claiming that the treatment increases the 
mean breaking strength of this type of yarn. 

(d) Explain why you were able to construct 
your confidence interval without knowing 
the form of the distribution of the breaking 
strength of a piece of yarn. (NEAB) 


Shoe shop staff routinely measure the length of 
their customers’ feet. Measurements of the length 
of one foot (without shoes) from each of 180 
adult male customers yielded a mean length of 
29,2 cm and a standard deviation of 1.47 cm. 


(a) Calculate a 95% confidence interval for the 
mean length of male feet, 

(b) Why was it not necessary to assume that the 
lengths of feet are normally distributed in 
order to calculate the confidence interval in 
part (a)? 

{c) What assumption was it necessary to make 
in order to calculate the confidence interval 
in part (a)? 

(d) Given that the lengths of male feet may be 
modelled by a normal distribution, and 
making any other necessary assumptions, 
calculate an interval within which 90% of 
the lengths of male feet will lie. 

(e) In the light of your calculations in parts (a) 
and (d), discuss, briefly, the question ‘is a 
foot a foot long?’ (One foot is 30.5 cm.) 

(AEB) 


12. Before its annua! overhaul, the mean operating 
time of an automatic machine was 103 seconds. 
After the annual overhaul, the following random 
sample of operating times (in seconds) was 
obtained: 


90, 97, 101, 92, 101, 95, 95, 98, 96, 95. 


Assuming that the time taken by the machine to 
perform the operation is a normally distributed 
random variable with a known standard 
deviation of 5 seconds, find 98% confidence 
limits for the mean operating time after the 
overhaul. 

Comment on the magnitude of these limits 
relative to the mean operating time before the 
overhaul. (AEB) 


13. Packets of baking powder have a nominal weight 
of 200 g. The distribution of weights is normal 
and the standard deviation is 7g. Average 
quantity system legislation states that, if the 
nominal weight is 200 g, 

(i) the average weight must be at feast 200 g. 

(ii) not more than 2.5% of packages may weigh 
less than 191 g. 

(iii) not more than 1 in 1000 packages may 
weigh less than 182 g. 

A xandom sample of 30 packages had the 

following weights: 
218, 207, 214, 189, 211, 206, 203, 217, 
183, 186, 219, 213, 207, 214, 203, 204, 
195, 197, 213, 212, 188, 221, 217, 184, 
186, 216, 198, 211, 216, 200. 


(a) Calculate a 95% confidence interval for 


the mean weight. 

(b) Find the proportion of packets in the 
sample weighing less than 191 g and 
use your result to calculate an 
approximate 95% confidence interval 
for the proportion of all packets 


weighing less than 191 g. (AEB) 


14. A company manufactures bars of soap. In a 
random sample of 70 bars, 18 were found to be 
mis-shaped. Calculate an approximate 99% 
confidence interval for the proportion of 
mis-shaped bars of soap. 

Explain what you understand by a 
99% confidence interval by considering 


{a} intervals in general based on the above 


method, 
(b) the interval you have calculated. 


The bars of soap are either pink or white in 
colour and differently shaped according to 
colour. The masses of both types of soap are 
known to be normally distributed, the mean 
mass of the white bars being 176.2 g. The 
standard deviation for both bars is 6.46 g. A 
sample of 12 of the pink bars of soap had 
masses, measured to the nearest gram, as 
follows: 


15. 


174, 164, 182, 169, 171, 187, 
176, 177, 168, 171, 180, 175. 


Find a 95% confidence interval for the mean 
mass of pink bars of soap. 

Calculate also an interval within which 
approximately 90% of the masses of the white 
bars of soap will lie. (AEB) 


An experimental physicist needs to estimate the 
true viscosity, 4 Pascal seconds (Pa s), of a light 
machine oil. Using the same apparatus he takes 
12 independent measurements, x Pa s, of the 

viscosity of the oil, obtaining the values below: 


25.8, 25.2, 24.7, 25.5, 25.3, 25.4, 
25.2, 25.3, 25.8, 25.9, 25.2, 24.9. 


(Sx=304.2 Ex? =7712.9) 


When using this apparatus, measurements of the 
oil’s viscosity are distributed with mean # and 
variance 3”. 
Obtain unbiased estimates of # and o*. Hence 
obtain a symmetric 95% confidence interval for 
x. State any distributional assumptions you have 
made in obtaining your confidence interval. 
The physicist explained the meaning of his 
confidence interval by saying there was a 
probability of 0.95 that 4 lay between the limits 
of the interval. Explain why this interpretation is 
wrong and provide a correct explanation of 95% 
confidence as used in this context. 
The manufacturer of the oil quotes a viscosity of 
25.5 Pas for the oil. With reference to your 
confidence interval, state any conclusion you can 
come to regarding the validity of this figure. 
(NEAB) 


16. Three weeks before an election in a certain 


constituency an opinion poll was conducted 
using a random sample of 800 voters selected 
from the electoral roll. The numbers of persons 
who said they would vote for parties A, B, Care 
recorded below; the remainder were categorised 
as ‘Don’t know’. 


Party'A..Party B.... Party © Don’t know 


264 256 144 136 


(a) Calculate an approximate 90% symmetric 
confidence interval for the proportion of the 
total electorate in the constituency that will 
vote for party A in the election. 

(b) Give a very brief description of how the 
sample might have been selected, to ensure 
that it was random. 

(c)_ In the actual election, 41% of the total 
electorate voted for party A. Give two 
possible explanations for the fact that this 
value is not contained within the confidence 
interval calculated in (a). (NEAB) 


17. Inan investigation to assess the difference in use 
between a credit card and a store card a random 
sample of 20 people, each using both cards, was 
selected. They supplied information from which, 


in 1994, the difference between each person’s 
mean monthly spending on the credit and store 
cards, £d, was calculated. The following 
summary data were then calculated. 


Yd=1664 and Yd?=426 445. 


Stating all necessary distributional assumptions. 
calculate a symmetric 90% confidence interval : 
for the mean difference between the mean 

monthly spending for all users of the two cards, 


(NEAB) 


18. The mass, x millgrams, of each of 10 randomly 
selected units of a new cancer drug was 
measured and the following results obtained: 


35.9, 35.2, 35.0, 34.9, 35.4, 
34.8, 35.0, 35.1, 35.3, 35.1. 


Assuming that the masses are normally 
distributed with mean yp, calculate an 80% 
confidence interval for j. 


19. Ten random samples of nylon fibre were tested 
for the amount of stretching under tension. 
Each fibre had the same length and diameter and 
was stretched by applying a standard load. 


Mixed test 9A 


The increase in len, 
follows: 


13.52, 14.06, 13.19, 14.77 
» 14.06, 13.19, 14.77, 12.80 
12.06, 15.12, 14.39, 15.81, 1338, 


Calculate a 95% confidence interval for the 
mean increase in length of the population of 
fibres, assuming that the increase in length can 
be modelled by a normal distribution. 


gth, in millimetres, were as 


20. During a Particular evening, 10 babies were born 
ona particular maternity ward in a large 
hospital. The lengths, in centimetres, of the 
babies were noted: 


50, Sl, 45, 47, 49, 48, 54, 53, 45, 50. 
Assuming that the sample came from an 
underlying normal population, calculate a 


95% confidence interval for the mean of the 
population. 


21, The external diameters (measured in units of 
0.01 mm above a nominal value) of a sample of 
piston rings produced on the same machine were: 


11, 9, 32, 18, 29, 1, 21, 19, 6. 


Assuming a normal distribution, calculate a 95% 
confidence interval for the population mean. 


(AEB) 


1. A random sample of 40 nails is drawn from a 
population whose lengths are normally 
distributed with mean 4 mm and standard 
deviation 0.48 mm. 


(a) Calculate the width of a symmetric 
99% confidence interval for « based on 
this sample. 

(b) Find the confidence level of a symmetric 
confidence interval having the same width as 
before, but based on a random sample of 
20 nails, (C) 


. From time to time a firm manufacturing 
pre-packed furniture needs to check the mean 
distance between pairs of holes drilled by 
machine in pieces of chipboard to ensure that no 
change has occurred. It is known from 
experience that the standard deviation of the 
distance is 0.43 mm. The firm intends to take a 
random sample of size m, and to calculate a 
99% confidence interval for the mean of the 
population. The width of this interval must be no 
more than 0.60 mm. Calculate the minimum 
value of 2. (L) 


3. Out of 248 cars parked in a car park, 72 were 
fitted with an anti-theft device on the steering 
wheel. Assuming that the cars form a random 
sample of parked cars, calculate an approximate 
95% confidence interval for the population 
proportion of parked cars fitted with an 
anti-theft device on the steering wheel. 

Give a reason, other than rounding in the 
calculations, why the interval is approximate. 
Give a reason why the assumption of ; 
randomness might not be valid. {C) 


4. The fat content of a well-known brand of 


becfburger was investigated by measuring the 
percentage of fat, X, in each of 12 randomly 
selected beefburgers. The results were 
summarised as follows: 


Ix =228, Dx2= 4448, 


Assuming the percentage fat content to be 
normally distributed, find a 90% confidence 
interval for the population mean j. 
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Mixed test 9B 


1. 


A group of 65 students is asked to guess the 
length of a particular object and their answers 
are recorded as x cm, with the following results: 


Sx = 6019.0 and Yx?= 557 733.8. 


(a) Show that the estimated standard error of 
the sample mean is 0.3 cm. 

(b) Determine an approximate symmetric 95% 
confidence interval for the mean of the 
population of all such guesses, giving your 
fimits correct to two decimal places. 

(c) State one assumption which you have made 
in your calculations. (NEAB) 


A survey was carried out by a County Meals 
Service in order to gauge the response to a new 
‘healthy eating’ menu. A random sample of 200 
schoolchildren was selected from schools using 
the menu and it was found that 84 children 
approved of it. Calculate an approximate 95 % 
confidence interval for the population 
proportion, p, who approve of the new menu. 

It is given that p = 0.38. Use a suitable 
approximation to calculate the probability that, 
in a random sample of 200 children, the 
proportion who approve of the new menu will be 
at least 0.42, (C) 


3. A researcher is designing a study to standardise a 
new intelligence test. It is known that scores on 
this type of test are normally distributed with a 
standard deviation of 15.0. 


(a) Write down in terms of %, the sample mean, 
and n, the sample size, an expression fora 
99% symmetric confidence interval for the 
mean test score. 

(b) Calculate, to the nearest 100, the value of 2 
such that the width of this confidence 
interval will be less than 1.0. ({NEAB) 


4. In Tesbury’s supermarket, economy packs of 
butter are marked 250 g. An inspector takes a 
random sample of 12 packs and weighs them. 
Correct to the nearest 0.1 g, the weights, in 
grams, were: 


246.5, 240.9, 245.3, 250.5, 248.7, 249.1, 
251.0, 249.8, 249.8, 247.6, 246.2, 241.4, 


(a) Making any necessary assumptions, which 
should be stated, calculate a 99% 
confidence interval for the mean weight of 
the packs of butter. 

(b) Calculate the width of the 99% confidence 
interval. 

(c) How is the width affected when calculating 
a 90% confidence interval? 


10 


Hypothesis tests: discrete distributions 


In this chapter you will learn about 


e the language of hypothesis testing 


e how to perform a test 


— for the parameter p of a binomial distribution (small sample) 
~ for the mean A of a Poisson distribution 


e Type | and Type Il errors associated with hypothesis tests 


Background knowledge 
You will need to be able to 


recognise the conditions needed for a situation to be m 7 ial distributi 
odelled by a binomial dis 
or a Poisson distribution. . ee 


find related probabilities by direct evaluation or by using cumulative probability tables. 


HYPOTHESIS TEST FOR A BINOMIAL PROPORTION, p 
(small sample size) ; 


Sid says that he has psychic powers and can read people’s thoughts. To test this claim, a 
volunteer from the audience sits on the stage while Sid sits in a separate room off stage The 
volunteer chooses a card from a well-shuffled pack and concentrates on the card for five 
seconds. At the same time, Sid writes down the suit of the card, either hearts, diamonds, 


spades or clubs. The card is replaced in the pack, the pack is shuffled and another card drawn. 


The procedure is repeated until 20 cards have been drawn. 


ie tn four suits, Sid has a one in four chance of writing down the correct suit if he 
een: 4 ipa pee isn’t guessing, you would expect him to get more than one in four 
shee ee a © gets a (or fewer) correct answers out of the 20, you would definitely say 
ie just guessing but if he gets as many as 19 or 20 correct you would have no 

‘ation in saying that he could read people’s thoughts. 


But what abo wers, would this be very unusual? Wha: 
ut other values? If he gets 12 correct answers, i 
; , would this ry sual? £ 
would you say if he got 10 correct? What about 8 correct? 


4B4 AC 


Somehow you have to decide on a cut-off point, c. This would be the least value you could 
find such that the probability of getting c or more correct answers would be very small. It 
would be considered a rare event to get c or more correct answers. 


ya * 7. 299 


0123 4 c-lL} oc 18 19 20 


To decide on the value of c, you could choose a number that seemed reasonable. If however 
you perform a hypothesis (or significance) test you will be able to back up your argument and 
conclusion with statistical theory. 


Suppose that X is the number of correct answers that Sid writes down for the suits of the 20 
cards. If you assume that Sid is just guessing, the probability that he writes down the correct 
suit is 0.25. The experiment is performed 20 times, so there are 20 independent trials, each 
with a probability of 0.25 of success. This suggests a binomial situation (see page 279). In 
fact, on the assumption that Sid is guessing, X can be modelled by a binomial distribution 
with 2 = 20 and p = 0.25, ive. X ~ B(20, 0.25). : 


You now need to look for a value, ¢, in this distribution such that P(X > c) is very small. 
Binomial probabilities can be calculated directly (see page 279) or found from cumulative 
probability tables which give P(X <r) for various values of n and p, where X ~ B(n, p). The 
extract here relates to X ~ B(20, 0.25) and has been reproduced from page 646. 


The tables give probabilities to four decimal places indicating 
that, to four decimal places, P(X < 13) = 1.0000. This implies 
that P(X > 14) = 1 — P(X < 13) tends to 0. If he is just guessing, it 


would be almost impossible for Sid to give 14, 15, 16, 17, 18, pe025 
19 or 20 correct answers. So if he gives, for example, 14 correct n=20 r=0 0.0032 
answers you would certainly have to conclude that he is able to 1 0.0243 
_ tead people’s thoughts in some way! 2 0.0913 
Similarly P(X > 13} = 1 ~ P(X < 12) = 1 - 0.9998 = 0.0002. 3 | 0.2252 
Getting 13 or more correct answers would be a very rare event. 4 0.4148 
P(X > 12) = 1— P(X < 11) =1- 0.9991 = 0.0009. 5 | 0.6172 
Getting 12 or more correct answers is still a very rare event. 6 0.7858 
7 0.8982 
P(X 2 11)=1- P(X < 10) =1- 0.9961 = 0.0039. 8 0.9591 
On about four occasions in every thousand Sid might give 11 or 9 0.9861 
more correct answers. This is still a rare event. oie BE 
0 0.9961 
P(X 2 10) = 1- P(X <9) =1- 0.9861 = 0.0138 = 1%. 1 0.9991 
It would be very unlikely for Sid to give 10 or more correct 2 0.9998 
answers but it could happen on about one occasion in every 3 1,0000 
hundred. 4 
P(X 29)=1- P(X <8) =1-0.99591 = 0.0409 = 4%. 15 
Tt would still be unlikely for Sid to give 9 or more correct answers. 
P(X > 8) =1- P(X <7)=1-~ 0.8952 = 0.1018 = 10%. . 
This probability is not that small. If Sid is just guessing, on 10% of occasions he could give 


8 or more correct answers. 


P(X <r) for X ~ B(20, 0,25) 


You have to make a decision about the value of the probabil 
unlikely or rare event. This probability is called the significa: 
events that have a probability of 5% or less are regarded as 


probability of 1% or less are regarded as very unlikely. Oft 
level is carried out. 


ity that is considered to imply an 
nce level of the test. As a guide, 
unlikely and events having a 

en a significance test at the 5% 


The cut-off point c is known as the critical value and the group of observations that are 
considered to be unusual or unlikely (rare) events is called the critical (or rejection) region. 
The critical value and critical region depend on the significance level chosen, ; 


Suppose you choose a significance level of 5% to test Sid’s claim. From the working above. 
Loan Pia ; 

P(X 2 8) = 10%. This is greater than 5%, so x > 8 is not the critical region; getting eight , 

correct answers would not be considered an unlikely or rare event. 


But P(X> 9) = 4%, which is less than 5%, so getting nine correct answers would be 
considered an unlikely or rare event. Therefore the critical value for a 5% level of significance 
is 9 and the critical region is x > 9, i.e, 9,10, 11, 12, ..., 19 or 20 correct answers. 


0-0-9 eg gg 


e o~ ~2—2—_ o_o. 
Oo 12 3 45 6 7 839 10 11 12 13 14 15 16 17 18 19 20 


The language used in hypothesis testing 


The assumption that Sid is guessing is called the null hypothesis and it is written Hp. The null 
hypothesis is very important as it provides the model for the calculations. You would write 
Ho: p = 0.25 
T t 


If Sid has psychic powers, then he should get more than one in four correct and the 
probability that he gives the correct suit will be more than 0.25. This is called the alternative 
hypothesis and is denoted by H,. You would write 
. Ay p> 0.25 
T t 


Since you are interested in whether the probability is greater than 0.25, the critical region in 
this example is at the right-hand end of the distribution. This is known as the upper tail. 


‘The variable X, the number of correct answers, is the test statistic. The number of correct 


answers that Sid gives in the experiment is the test value. To perform the hypothesis test you 
need to work out whether the test value lies in the critical region or not. 


If the test value lies in the critical (or rejection) region, reject H, in favour of H,. This means 


that you will reject that Sid is guessing in favour of the alternative hypothesis that he has 
psychic powers. 


i 
Tl 
H 
| 
A 
if 


Tf the test value does not lie in the critical region, do not reject Hy. There is not enough 
evidence to say that he has psychic powers. Writing this another way, if the test value lies in 
the acceptance region, accept Hy and conclude that Sid is guessing. 


Suppose that in the experiment Sid gives seven correct answers and he says that this proves 
that he has psychic powers. Is this enough evidence statistically? From the critical region 
diagram, you can see that, at the 5% level of significance, x = 7 does not lie in the critical 
region. Therefore Hp is not rejected. This means that there is not enough evidence to say that 
p > 0.25, i.e. to say Sid has psychic powers. You would conclude that he is just guessing. 


PROCEDURE FOR CARRYING OUT A HYPOTHESIS TEST 


To find whether the test value is in the critical region you can work out the critical region as 
described above. This is a useful method as it gives a lot of information, but its disadvantage 
is that it can be rather time-consuming. 


In this example, it may be quicker instead to calculate the probability that X is greater than 
the test value. If this probability is less than 5%, this means that the test value is in the upper 
tail 5% of the distribution, i.e. it is in the critical region. 


This method is illustrated in the working below which tests the sample value x = 7 and 
assumes that you have not found the critical region first. Note that the stages of the test are 
shown in the margin and additional commentary is given in italics. 


1, Define the Let X be the number of correctly identified suits out of the 20 trials. Assuming 
variable. that the pack is well shuffled between each trial and the trials are independent, 
X can be modelled by a binomial distribution, where X ~ B(20, p). 


2. State My and Hy: p = 0.25 (Sid is guessing) 
H,: p > 0.25 (Sid has psychic powers) 


If Hp is true, then X ~ B(20, 0.25) 
t 


null hypothes orp 


Use a one-tailed (upper tail) test, at the 5% level. 


The test value, x, will lie in the critical region, (the upper tail 5% of the 
distribution), if P(X > x)<5%. 
Reject Hy if P(X > x) < 5%, where x is 


the test value. P(X <r) for X ~ B(20, 0.25) 
tan p=0.25 
Sid gives 7 correct answers, so test — 


x =7 and find P(X 27). n=20r=0 0.0032 


From cumulative binomial tables 1 0.0243 
P(X > 7)=1-P(X < 6) Be) 00718 
La ayyesa 3 0.2252 
7 : 4 0.4148 
= 0.2141 5 | 0.6172 
~ 21% 6. [0.7858] <—P(X<6) 
7 0.8982 
8 0.9591 


If ; AF ; 
ee ae not have cumulative tables, work out the binomial probabilities as 
P(X > 7)=1-P(X < 6) 
= 1- (0.7570 +20 x 0.75" x 0.25 4 °C, x 0.758 x 0,252 
+?9C, x 0.75! x 0.2594 C, x 0.75" x 0,254 
+ °C; x 0.758 x 0,255 + C, x 0.75" x 0.259) 
=0.2142 
= 21% 
Since P(X > 7) > 5%, the test value x = 7 is not in the critical region. There is not 
enough evidence to reject Hp. 
You would conclude that Sid is just guessing; he does not have psychic powers. 
NOTE: when you are testing the value x =7, it may seem strange that you have to work out 


P(X > 7) rather than just P(X = 7). Remember that this is necessary as you are essentially 
looking for the critical region to see whether the test value lies in this region or not. 


The probabilities and critical region can be illustrated diagrammatically. Below is the 
probability distribution for X ~ B(20, 0.25). Note that it is positively skewed and the 
probabilities for 12 to 30 correct answers are so small that they cannot be shown on the 
diagram. The test value has been circled. 


1 Boundary for 
§ critical region (5%) 


4+ tt 
10 11 12 13 14 15 16 17 18 19 20 


erent i 


Since P(X> 8) 10% and P(X 2 9) ~ 4%, the 5% boundary comes between 8 and 9. Note 
that with discrete distributions you will probably not get a perfect 5% in your calculations. 


Example 10.1 


A drugs company produced a new pain-relieving drug for migraine sufferers and its 
assis stated that the drug had a 90% success rate. A doctor doubted whether the 
tug would be as successful as the company claimed. She prescribed the drug for 15 of her 


patients. After six months, 11 of these patients said that their migrai 
releved Bytes da i‘ graine symptoms had been 


(a) Test the drug company’s claim, at the 5% level of significance. 
(b) Should the doctor continue to prescribe the drug? 
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(a) Let X be the number of patients in 15 whose symptoms are aware the 

drug. Assuming that the effect of the drug on a patient is independent ° : € 

effect on other patients, X can be modelled by a binomial distribution, where 

X ~ BUS, p). 

Ho: p = 0.9 (The success rate of the drug is 90%.) ; 

H,: : < 0.9 (The success rate is less than 90% and drug is not as successful as 
the company claims.) 


If Hy is true, then X ~ B(15, a 


Since the alternative hypothesis is p < 0.9, the critical region is in ea Par tail 
of the distribution, so use a one-tailed (lower tail) test, at the 5% level. 


The test value, x, will lie in the critical region, (the lower tail 5% of the 
distribution), if P(X <x)< 5%. 
Reject Hy if P(X <x) < 5%, where x is the test value. 


The test value is x = 11, so find P(X < 11). ; 
vequired Using cumulative binomial tables, if the tables give only values of p up to 0.5, 
probability you need to use symmetry properties as illustrated on page 284. 


P(X < 11|p=0.9)=P(X > 4|p=0.1) P(X <1) for X ~ B(15, 0.1) 


erHerios 


=1-P(X < 3) p=01 
sete n=15 r=0 | 0.2059 
= 00836 dee 
Rees 2 | 0.8159 
If you are calculating the probabilities 3 | 0.9444] —P(X<3) 
directly: Lf oie 0.9873 
P(X < 11)=1-P(X 2 12) 5 1 0.9978 


=1- (Cy, x 0.13 x 0.92 + BC, x 0.12 x 0.9% 
+15x 0.1.x 0.9" +0.9)) 

=1-0,944.., 

= 0.0556 (4 d.p.) 

= 5.6% 7 
P(X < 11) is greater than 5%. This means that boundary for the ee oe 
(the lower tail 5% of the distribution) will be slightly to the left of x = 11. So 
x= 11 is not in the critical region. ; : 
Hy is not rejected and the drugs company’s claim of a 90% success rate 1s 
upheld. 

ar gatig oe : c 

(b) P(X < 11) is only just greater than 5%. With safety in mind, it would be wise ec 
that the doctor errs on the side of caution and carries out further tests before accepting 
the success rate is 90%. 
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It is time-consuming to draw a probability diagram when carrying out the hypothesis test, but 
it can be helpful in illustrating the results. The distribution X ~ B(15, 0.9) is very negatively 
skewed with the probabilities in the lower tail being t60 small to show in the diagram! 


Boundary for ! 
critical region (5%) # 


| | 
9 10 @) 12 13 14 15 S 


ONE-TAILED AND TWO-TAILED TESTS 


"One-tailed test 


In the examples so far, one-tailed tests have been considered, with either the upper or lower 
tail being used for the critical region, depending on the alternative hypothesis, 


In general, for a significance level of a%, null hypothesis Hy: p = py and a test value iG; 


- if H, involves a > sign, indicating that you are looking for an increase in p, use the upper 
tail to find whether P(X > x) < av, 


— if H, involves a < sign, indicatin 


g that you are looking for a decrease in p, use the lower tail 
to find whether P(X <x) <a%, 


Remember that in both cases you use P(X...) < a%. 


Two-tailed test 


A two-tailed test is carried out when the alternative hypothesis looks for a change in p, not 
specifically an increase or a decrease. If the significance level is @%, then the critical region is 
in two parts, half in the lower tail and half in the upper tail, 


if H, involves a ¥ sign, indicating that you are looking for a change in p, 


in the lower tail, the critical region consists of values less than or equal to c, such that 
P(X < c,) < 40%. 


in the upper tail, the critical region consists of values greater than or equal to c, such that 
P(X > o) < ja%. 


For example, for a 5% significance level (two-tailed test) 
critical region might look like this: 


the probability distribution and 


Example 10.2 


In last year’s local elections, the Purple party gained 35% of the vote. Prior to this year’s 
election, the party asked a researcher to find out whether support of the party had changed. 
Out of twelve voters selected at random, one said that he would vote for the Purple party. 


(a) Test, at the 10% level, whether support for the Purple party has changed. 
(b) Find the critical region for the test. 


Solution 10.2 


1. Define the (a) Let X be the number of voters in 12 who say that they will vote for the 
ble, Purple party. Assuming that each person votes independently, X can be 
modelled by a binomial distribution, where X ~ B(12, p). 
2. State Hy and Hp: p = 0.35 (The support has not changed) 
Hy. H,: p # 0.35 (The support has changed) 
3, State the If Hy is true, then X ~ B(12, 0.35). 


fo distribution 


according to Ay. 


Since the alternative hypothesis is p # 0.35, consider both tails of the 
distribution and perform a two-tailed test at the 10% level 

\ In this case the 10% for the significance level is distributed evenly between 
get the upper and lower tails, with 5% at each tail. 


tee 4. State level ancl 


The test value, x will lie in the critical region, (the lower tail 5% or the 
upper tail 5%), if P(X <x) < 5%, or P(X 2x) <5%. 

Reject Hy if P(X <x) < 5%, or P(X 2x) < 5%. 

The test value is x = 1, so you need to look at the lower tail part of the 
critical region and find P(X <1). 

probability, P(X < 1)= P(X =0)+P(X=1) 

= 0.65412 x 0.65"! x 0.35 


= 0.04244... 
~ 4.2% 


(You can use cumulative binomial tables if they are available.) 


HYPOTHES 


Since P(X <1) < 5%, the sam 

Since oy ple value x = 1 lies j 

is rejected in favour of H,. At the 10% siti 
that support for the Purple party has changed. 


(b) To find the critical re 


© fi gion, consider separ: ne i 
hehe 5 parately the upper and lower tails of the 


Critical region in lower tail: \ 
Find the maximum value of ¢ such that P(X <c) < 0.05. 


You already know that P(X i ie i 
ee (X <1) < 0.05, ie. that 0 and 1 lie in the critical region, 


P(X < 2)=0.0422...4 P(X = 2) 
= 0.0422... + 2C, x 0,651 x 0,352 
=0.1512.., 
= 15% 
P(X <2) i %, indicati 
So the lowe al port oft al i pati the cil region 
Critical region in upper tail: . 
Find the minimum value of c that P(X > c) < 0.05. By guesswork. try c=8 
P(X > 8)= 9 C,x 0.654 x 0.3584 2C, x 0.653 x 0,359 + 2C , 65? 35% 
+12 x 0.65 x 0.351 40.352 US re, 
= 0.0255... < 5% 


This indicates that x > 8 is in the critical region, 


But is 8 the smallest value in the critical region? To check this, try c= 7: 
P(X > 7)=?C,x 0.655 x 0.3574 0.0255... 
= 0.084... > 5% 


Ser indicates that x = 7 is not in the critical region. 
© the upper tail part of the critical region consists of x > 8 


Therefore the critical region is x = 0, 1, 8, 9, 10, 14, 12. 


5% i 


15% 
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t i 

i] i 

i 1 

i) # 

' 1 

Hy i 

i q 

é U 

' 4 

5 

t i} 

1 f 

i l 
a+ : | | 
Oo 12345 6789 Te 
oer = 10 11 12 
cal region { 


critical region, so Hy 
el, there is evidence 


Define X, the binomial variable being considered and the general form of its distribution, 
for example X ~ B(12, p). ; 
State the null hypothesis Hy and the alternative hypotheses H, concerning p for example 
Ay: P = Po 

Ay p<Po - 

State the distribution of X assuming that the null hypothesis is true, gran! ha x 
does follow a binomial distribution with the value of p specified in Hp, for examp 

X ~ B12, Po) ehohed 
State the type of test (one-tailed or two-tailed). This depends on whether the ate 
hypothesis looks for an increase or a decrease (one-tailed) or a change (two-tailed) in p, 
for example, 


Hy: P = Po Ho: P = Po Hy P = Po 

Hyp <Po Hy: p> Po Fy: p + Po 

indicates one-tailed test indicates one-tailed test ind icates Scien ; 
(lower tail considered for (upper tail considered for (both tail ends considere 
critical region) critical region) for critical region) 


is defines the critical region. 
State the significance level of the test, 7%. Remember that this d 


State the criterion for rejection of Ho, for example, for test value, x, 


: : or) 
Reject Hy if P(X <x) <a%, Reject Hy if P(X > x) <a%, Reject Hy if P(X - : < ja%, 
i.e. if x lies in the lower tail i.e, if x lies in the upper tail or if PX > » < jar, 
a a% of the distribution i.e. if x lies in the lower or 
upper tail 34% of the 


a% of the distribution 


distribution 


Obtain the test value, x. 


Calculate the required probability to see whether x lies in the critical (rejection) region. 


i f the 
Make your conclusion by rejecting Hy or not. Then relate this to the context o 
situation being tested. 


i e 
NOTE: The method is essentially the same for a large sample, but, in the ace of at 
samples, use is made of the application of the normal approximation to the binomia 
9 


distribution. This test is described on page $28. 


TYPE | AND TYPE Il ERRORS 


When you perform a significance test, there are four possibl 
shown in the table below. 


conclusions, and these are 


Two of the conclusions lead to correct decisions and the other two lead to wrong ones. 
The errors associated with wrong decisions are called Type I and Type Il errors. 


The outcomes and errors are summarised as follows: 


(a) Hy is true and your test leads you to accept H, — correct decision 

(b) Hy is true but your test leads you to reject Hy ~ wrong decision ~ Type I error 
(c) Hy is false but your test leads you to accept Hy — wrong decision ~ Type II error 
(d) Hy is false and your test leads you to reject Hy ~ correct decision 


It can be helpful to see these on a diagram: 


Test decision 
Accept H, Reject Hy 
Actual Hy is true “correct Type I error 
situation Hy is false Type If error “correct 


You reject H, if the test value lies in the critical region. This region is fixed according to the 
level of significance of the test, so the probability that the test value lies in the critical region is 
the same as the significance level. So for a test carried out at the a% level of significance, 


P(Type I error) = a% 


To calculate the probability of making a Type Il error, a specific value for A, is stated. 
‘The error is then calculated as follows: 


P(Type I error) = P(accept Hy when H, is true). 


This is illustrated in the following example. 


Example 10.3 


A random observation is taken from a binomial distribution X ~ B(20, p) 


and used to test the 
null hypothesis p = 0.8 against the alternative hypothesis p > 0.8. 


The critical region is chosen to be x > 19. 


(a) What is the significance level of the test? 
(b) What is the probability of making a Type I error? 
(c) Find the probability of making a Type II error if, in fact, p = 0.85. 
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Solution 10.3 
You are given that X ~ B(20, p). 
(a) Hy: p= 0.8 
Hy: p> 0.8 


The critical region is x > 19, so to find the significance level of the test, find P(X > 19). 


P(X > 19) = P(X =19) + P(X = 20) 
= C4 x 0.2.x 0.819 + 0.82 
= 0.0691... 
= 7% 
The significance level is approximately 7%. 
(b) P(Type I error) = P(reject Ho when H, is true) 
H, is rejected if x 2 19, so 
: PT ype Lerror) = P(X > 19 when p = 0.8) ~ 7% (found above). 
Note that this could have been stated directly from part (a), since the probability of a 
Type I error is the same as the significance level of the test. 
(c) You make a Type II error when you accept Ho (which you will do if x < 19) when p is the 
value specified in H, (not the value given by Ho). 
The hypotheses are now 
Hy: p = 0.8 
Hy: p = 0.85 
So P(Type II error) = P(accept Hy when H, is true) 


= P(X < 19 when p = 0.85) 
= P(X <19 when X ~ B(20, 0.85)) 


P(X < 18)=1- P(X = 19) - P(X = 20) 
=1-*C,, x 0.15 x 0.85 ~ 0.857 
= 0.8244... 
= 82% 
The probability of making a Type I error is 82%. 


This is a very high probability. To make this smaller, you could increase the a aa ee 
level of the test. But this would of course increase the probability of making a Type 


: . meee d . le 
Exercise 10a — Testing p in a binomial distribution (small samples) 
ted that a die was biased in fav 
: ee ocintid She decided to carry out a 
hypothesis test. 
{a} State suitable null and alternative 
hypotheses for the test. 
When she threw the die 15 times, she obta’ 
four on 6 occasions. 


(b) Carry out the test, at the 5% I 
your conclusion clearly. 


out 

1. Acertain type of seed has a germination rate of 
70%. The seeds undergo a new treatment after 
which 9 germinate in a packet of 10 seeds, 
Stating suitable null and alternative pppoe 
test, at the 5% level, whether this is evidence o: 
an increase in the germination rate. 


ined a 


level, stating 


Level of 
x n Hypotheses significance 


3. The random variable X can be modelled by a 


binomial distribution with 7 = 10. 

A random observation, x, is taken from the 
distribution. 

Test, at the 8% level, the hypothesis that 

p = 0.45 against the alternative hypothesis 
p + 0.45, (a) when x = 7, (b) when x = 1, 


4. Records kept in a hospital show that 3 out of 


every 10 casualties who come to the casualty 
department have to wait more than half an hour 
before receiving medical attention. The hospital 
decided to increase the staffing in the department 
by one person and it was then found that, of the 
next 20 casualties, 2 had to wait for more than 
half an hour for medical attention. 
Test whether the new staffing has decreased the 
number of casualties who have to wait more 
than half an hour for medical attention. 

Perform the test (a) at the 5% level, (b) at the 
2% level. (L) 


5. The random variable X can be modelled by a 


binomial distribution with parameters 2 = 9 and 
p, whose value is unknown. 


{a) Find, at the 10% level of significance, the 
critical region to test the null hypothesis that 
p = 0.3 against the alternative hypothesis 
that p > 0.3. 

(b) Explain what is meant by a Type I error. 

{c) State the probability of making a Type I 
error in the test described in (a). 


6. In each of the following, a random observation x 


is taken from a binomial distribution X ~ B(n, p). 
Test the given hypotheses at the significance level 
stated. 


(a) 6 8 Hy: p=0.45, 5% 
Hy: p> 0.45 
(6) To 102 Hy p = 0.45; 5% 
Hy: p< 0.45 
(c) 91S Hy: p = 0.35, 5% 
Hi: p> 0.35 
(@) 9-15 Aa p $0.35, 5% 
Hy: p #0.35 
{e) 2 9 Hy p= 0.45, 5% 
Hy: p <0.45 
(f) 16-20. Hp =0.45, 1% 
Hyp > 0.45 
5 7. Ho p=0.4, 10% 
Ay: p-> 0:4 
2°20 Hy: p= 0.3; 1% 
Ay p<03 


7. 


10. 


11. 


A driving instructor clainis that 95% of his 
pupils pass their driving test at the first attempt. 
Tom is considering having lessons with this 
instructor but wonders whether 95% is an 
overestimate. He decides to conduct a 
significance test at the 5% level and discovers 
that last month, out of the 15 pupils who took 
the test for the first time, 11 passed, 


(a) What would Tom decide about the driving 
instructor’s claim? 

(b) Find the critical region for the number of 
failures last month. 


» Ina test of ten true-false questions, Sian got 8 


correct, Test at the 5% level whether she could 
have obtained this score by guessing all the 
answers. 


. Ata particular hospital it was found from past 


records that the probability that a patient does 
not turn up for an appointment is 0.3. 

Following a campaign to make patients more 
aware of the problems caused by missed 
appointments, a significance test at the 10% level 
was carried out to decide whether the campaign 
had been effective in reducing the number of 
patients who did not turn up for an 
appointment. A random sample of 16 patients 
was surveyed. 


{a) Find the critical region for the test. 

(b) Find the probability of making a Type II 
error in the test described in (a) if in fact the 
probability that a patient does not turn up 
for an appointment is now 0.25. 


Jessica is trying to find out whether a particular 
coin is biased, so she performs a significance test, 
She decides that she will say that the coin is 
biased in favour of heads if, when she tosses it 
15 times, at least two-thirds of the tosses result 
in heads. 


(a) What significance level did she use for her 
test? 

(b) What is the probability that she makes a 
Type I error? 

(c) If, in fact, the coin is biased, with 
probability of 0.7 of obtaining a head, what 
is the probability that she makes a Type II 
error? 


A random observation is taken from a binomial 
distribution X ~ B(25, p) and used to test the null 
hypothesis p = 0.4 against the alternative 
hypothesis p < 0.4, 

The critical region is chosen to be x <6, 


{a} At what significance level is the test carried 
out? 

(b) What is the probability of making a Type I 
error? 

(c) Find the probability of making a Type II 
ervor if, in fact, p = 0.3. 


RIBUTIONS 497 


496 


SIGNIFICANCE TEST FOR A POISSON MEAN 4 Solution 10.4 


Let X i i oO! assifies vi n 
be the number of misprints on the classifi dad ertisements Pp 
age, 


Assuming that misprints OCccuUE I uy, A ca e Mo: i 
ssut andomly, X b di P, 
f ). omly elled by a Poisson 


Ay: A = 6.5 
Hy: A> 6.5 (the number of misprints has increased) 


If Hp is true, then X ~ Po(6.5). 


This follows the same’ pattern as that for the binomial parameter p as follows: 


@ Define X, the Poisson variable being considered and the general form of its distribution; for 


example.X'~ Po(A): 
@ State the null hypothesis Hy and the alternative hypothesis H, concerning 4, for example 
Hy: A= 25 Since the alternati i 
rnative hypothesis is 4 > 6.5 i 
eds 2 2 , -5, use a one-tailed test at the 5% 
: i and consider the upper tail of the distribution for the eae ten. cada: 


e. State thé distribution of X assuming that the null hypothesis is true, i.e. assuming Eee PI Aisa etree bare a 
, e value x will lie in the : 


eee 4 Poisson distribution with the value of 4 specified in Hy, for example rej critical region if P(X > x) < 5%. Hila these eek 
criterios < . . 
h : f ted led). f 1 Reject Hy if P(X > x) <5%, where x is the test value A=6.5 
e.. State the type of test (one-tailed or two-tailed), for example ; es Ged 
‘ yp ( ); ple, a test value is « = 12, so find P(X > 12). r=0 0.0015 
Haj A= Ay Hy: A =Ay Hye th, sing cumulative Poisson tables (page 648) 1 0.0113 
Hyd<dg Hyrd> Aq Hydtag P(X > 12)=1-P(X < 11) 2 0.0430 
one-tailed test one-tailed test two-tailed test =1-0.9661 3 0.1118 
(lower tail considered for (upper tail considered for (both tail ends considered for = 0.0339 4 0.2237 
critical region) critical region) critical region) = 3.39% $ 0.3690 : 
6 | 0.5265 
@ State the significance level of the test, for example 2%. This defines the critical region. ee Since nea 12) < 5%, the sample value of 12 7 0.6728 | 
; misprints lies i iti ; : : 
SER an Se SOS Serig Tar exalliphé ee Lag 5 in the critical region, so reject Hp in : ets [ 
i 8774 
; There is evidence, at the 59 
Reject Hy if P(X <x) <0%, Reject Hy if P(X>x)<a%, Reject Hy if P(X < x) < ja% bina Ube jc - 32 joes that the average 10 | 0.9332 
i.e, if x lies in the lower tail i.e: if x lies in the upper tail or if P(X > x)< ja%, NOTE: by furth ‘onda . 1 0.9661 | 
a% of the distribution a% of the distribution ie. if x lies in: the lower or P(X> il) = rere se neon you will find that 2 9.9840 i 
upper tail 4a% of the éties bi = 6.68% > 5%, so the boundary for the critical region 13 0.9929 
aera 's between 11 and 12. The critical region is theref 14 0.9971 
distribution and the null hypothesis wil : ie ore x > 12 0 
ypothesis will be rejected if 12 or more misprints are found in the sample 
e. Obtain your test value; x. X ~ Pa(6.5) : 
@ Calculate the required probability to see whether x lies in the critical (rejection) region. + 5% boundary 
@ Make your conclusion by rejecting Hy or not. Then relate this to the context of the i 
situation being tested. i 
When A is large, a normal approximation to the Poisson distribution can be applied. The test i 
is similar to the one for normal approximation to the binomial described on page 528. | : 
1H [ os me 
Example 10.4 23 4 5 6 7 8 9 10 11 (3 13 14 15 16 17 x 
The number of misprints in the classified advertisements pages of the Daily Informer is found ia mee 
to have a Poisson distribution with average 6.5 misprints per page. A new proof reader is es tm ee aes 
employed and the number of misprints on a page was found to be 12. The editor said that the 
average number of misprints had increased. Test this claim at the 5% level. 
a 
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eels {c} State the significance level of the test. 
i (d) If, in fact, the treatment has reduced the number 


ae f fl ; 
probability of making a Type II error when appl or Haws to one per metre, find the 


The number of breakdowns per week of an office photocopier can be modelled by a Poisson 
ying the test described above. 


distribution with mean 4.5. The photocopier was serviced and during the next week it broke 
- down just twice. There is no evidence, at the 10% level, of an improvement in the reliability 


of the photocopier. Solution 10.6 ao 


See page 493 for the definitions of Type I and Type II errors. 


Solution 10.5 Gi iteee desass 
er of flaws in a 4-metre length of fabric. A i 
1. Define the Let X be the number of breakdowns in a week, where X ~ Po(A). randomly, X ~ Po(A). 8 abric. Assuming that the flaws occur 
War ia ble. 
: pel “er Hyac4s as iad length, the expected number of flaws is eight. The hypotheses for the 
Hy, H,A<4.5 (the number of breakdowns has decreased implying that the ignificance test would be 
photocopier is more reliable) ’ Hy: 14=8 
3. State the If Hy is true, then X ~ Po(4.5). Ay:A<8 (the mean number of flaws in 4 metres is less than eight and the treatment 
Teton has been successful in reducing the number per metre.) 
according to Ay. (b) If Ho is true, then X ~ Po(8). 
4. a Since the alternative hypothesis is 4 < 4.5, use a one-tailed test and consider the : You are told that values of x <5 are i ‘a F 
5 iota te 2: : n the critical 
type of test. lower tail of the distribution for the critical region. test v alaeiofn 24 would lead a thespall oa ee Soa P(X <r) el eva 
5. State the At the 10% level, the sample value x will lie in the critical region if rejected. , 
rejection P(X <x) < 10%. P(Type I error) = P(reject H, when H, is tri 
criterion, Reject Hy if P(X <x) < 10%, where x is the test value. =P(X<5 shen X~ ey 
6. Gateulate the The test value is 2, so find P(X <2). P(X <r) where X ~ Po(4.5} From tables 
required Using cumulative Poisson tables (page 647) Aa=45 P(X < S)=P(X < 4) 
probability. P(X < 2) = 0.1736 = 17.36% r=0 0.0111 = 0.0996 
NOTE: if you do not have access to tables, then 1 |__0.0611 = 10% 
calculate P(X < 2) as follows (see page 292): 2: 0.1736 P(Type I error) = 0.10 (2 d.p.) 
3 0.3423 
P(X < 2) = P(X =0) + P(X = 1) + P(X =2) 4 0.5321 
4,5? - (c) The significance level of th i ili i 
pert 5 0.7029 sign e test is the same as the probability of 
- “[ a 2! 6 0.8311 the significance level is 10%. eden Pane 2 AUPE) EES 
= 0.17357... (d) If, in fact, the number of flaws per metre has been redu 
o ced to 1, then th 
= 17.36% of flaws in a 4-metre length is four. The frvpolietes beste en the expected number 
7. Make yout Since P(X <2) > 10%, the test value of two breakdowns does not lie in the Hy: A= 8 
conclusion, critical region, so Hy is not rejected. Spas Hy: A=4 P(X <r) where X ~ Po(4) 
There is no evidence, at the 10% level, of an improvement in the reliability of , 4=40 
the photocopier. (Type I error) = P(accept Hg when H, is true) r=0 0.0183 
You reject Hy when x < 5, so accept Hy when x > 5. 1 0.0916 
P(Type Il error) = P(X > 5 when A=4 z 0.2381 
Example 10.6 3 Or 
The number of flaws per metre of fabric follows a Poisson distribution with mean 2. With the = 1-P(C < 4 when X ~ Po(4)) 4 0.6288 
aim of reducing the number of flaws, the fabric is subjected to a different treatment ee =1-0.6288 5 0.7851 
After this treatment a significance test is devised to gauge whether it has been successful. = 0.3712 6 0.8893 
test states that the number of flaws has decreased if a randomly selected 4-metre length o ws 37% 


cloth contains fewer than five flaws. rie dichahiw skaals a 
ity o i . a 
(a) State the null and alternative hypotheses for this significance test. : y of making a Type Il error is approximately 37%. 


(b) Find the probability of making a Type I error when the test is carried out. 


The number of white corpuscles on a slide has a 
Poisson distribution with mean 3.5. After 
treatment, a sample was taken and the number 
of white corpuscles was found to be 8, Test at 
the 5% level of significance, whether the number 
of white corpuscles has increased. 


The number of telephone calls to an office on a 
weekday follows a Poisson distribution with a 
mean number of six per hour. 


(a) On Monday there were 5 calls between 
10.00 a.m. and 40.30 a.m, Test, at the 5% 
level, whether the number of calls has 
increased. 

(b) On Wednesday there were 3 calls between 
41.00 a.m. and 12.30 p.m. Test, at the 5% 
level whether the number of calls has 
decreased. 


Over a long period of time, Jane has found that 
the bus taking her to school arrives late on 
average 9 times a month. In the month following 
the start of the new summer schedules, Jane finds 
that her bus arrives late 13 times. 
Assuming that the number of times the bus is late 
can be modelled by a Poisson distribution, test, 
at the 5% level of significance, whether the new 
schedules have in fact increased the number of 
times on which the bus is late, State clearly your 
null and alternative hypotheses. 


. A single observation is to be taken from a 


Poisson distribution with mean A and used to test 

the null hypothesis 4 = 8 against the alternative 

hypothesis 4 < 8. 

The critical region is chosen to be x < 3. 

(a) Find the probability of making a Type I 
error, 

(b) Find the probability of making a Type II 
error if, in fact, A = 6. 


5. For each of the following, an observation, x, is 
taken from a Poisson distribution, where 
X ~ Pola). 
Test the hypotheses at the level of significance 
stated. 


Level of 
x Hypotheses significance 

(a) 14 Hy 4=7 5% 
Hy: A>7 

(b) 12 Hy A=7 5% 
HyA+#7 

(c) 4 Hy 4=10 1% 
Hy: A<10 

(d) 18 Hy: 4= 10 5% 
Hy: A> 10 

(e) 2 Hy A=6.5 5% 
Hy: 1#6.5 

(f) 2 Hy: A= 6.5 5% 
Hya<6.5 


6. Ina particular city it was found, over a period of 
time, that X, the number of cases of a certain 
medical condition reported in a month, has a 
Poisson distribution with mean 3.5. During the 
month of August, seven cases were reported. 
Stating a necessary assumption, perform a 
significance test at the 5% level to decide 
whether or not this number of reported cases 
suggests that the number of occurrences of the 
medical condition has increased. State your 
hypotheses and conclusions clearly. 


‘The following summary shows the stages of the hypothesis test. For details relating to the 
particular distributions, see page 492 for the binomial test and page 496 for the Poisson test: 


1: 
2: 


State the variable being considered. 


State the null hypothesis Hy and the alternative hypothesis Hj. 


If you are looking for 
an increase, then Hifi? 
a decrease, then Ayres 


a change, then Hit .9s 


... (one-tailed test, upper tail) 
<... (one-tailed test, lower tail) 


+... (two-tailed test, upper and lower tails) 


Consider the appropriate distribution if the null hypothesis is true. 


Decide on your rejection criterion. 


. Decide on the significance level of the test. This fixes the critical (rejection) region. 


Now consider the value of the test statistic. oO 


6. i i 
Perform any calculations necessary to find out whether the test Statistic is in the critical 


region. 

7. Make your conclusion: 
~ If the test statistic is in the critical region, reject H, in favour of H,. 
— If the test statistic is not in the critical region, do not reject Hp. 
Then relate this to the context of the situation being tested. 


Test decision 
Accept Hy Reject Hy 
Actual Hy is true v Type I error 
situation Hy is false Type II error v 


P(Type I error) = P(reject Hy given that Hy is true) 
= a% (where 2% is the significance level) 


P(Type I error) = P(accept Hy given that Hy is false) 
= P(accept Hy given that H, is true) 


Miscellaneous worked examples 


Example 10.7 


» 501 


When I used to play darts regularly I scored a bull’s-eye on average on 40% of attempts. After 


a ‘se of three months, I play darts one evening and score two bull’s-eyes in 12 attempts. I 
wish to test whether the percentage of attempts on which I score a bull’s-eye has decreased 


(a) gs y 
a) Stating a necessary assum tion, use an exact binomial distribution to ca: out the te: 
y 19) > ut: a th St, 


using a 10% significance level. 
(b) Comment on the validity of the assumptions made in (a). 


Solution 10.7 


(a) Let X be the number of bull’s eyes scored in 12 attempts. 


Assuming that the result of an attempt is independent of the results of all other attempts, 
, 


X can be modelled by a binomial distribution, where X ~ B (12, p) 


Ho: p = 0.4 (I score a bull’s eye on 40% of the attempts.) 

A: p < 0.4 (The percentage of attempts on which I score a bull’s eye has decreased.) 
If Hy is true, then X ~ B(12, 0.4). 

Carry out a one-tailed (lower tail) test at the 10% level. 


Reject Hy if P(X <x) < 10% where x is the test value, 


(C) 


The test value is x = 2. 
<2) = P(X =0) + P(X =1)+ P(X =2) 
Te = net + ‘s x 0,61 x 0.44 2C, x 0.6" x 0,42 
= 0.08344... 
=~ 8.3% 
(You can use cumulative binomial tables if they are available.) 


Since P(X < 2) < 10%, the sample value x = 2 lies in the critical region, so Hg is si sai in 
4 + é 
favour of H,. At the 10% significance level, there is evidence that the percentage o 
whi d. 
ts on which I score a bull’s eye has decrease ~ 

b) Gee tually improves when making several attempts, so it is unlikely that ce a are 
independent. If they are not independent, then a binomial model is not suitable and the 

test is not valid. 


Example 10.8 


A die is suspected of bias towards showing more sixes than sls be aie e ae 
i is, iti i hrow the die 12 times. The null hypothesis p = 4, 
die. In order to test this, it is decided to th fae ced arena ON 
is the probability of the die showing a six, will be rejec 
ene aes if the a hae of sixes obtained is 4 or more. Calculate, to three decimal 

places, the probability of making 


(a) a Type I error, ; : (C) 
(b) a Type I error if, in fact p =3. 


Solution 10.8 


(a) Let X be the number of sixes obtained when the die is thrown 12 times. 
Then X ~ B(12, p). 


Hy: p =t (The die is fair) 
Hep >4 (The die is biased in favour of sixes) 


P(Type I error) = P(reject Hy when Hy is true) 
If Hy is true, then X ~ B(12, 2). 


Also Hp is rejected if x > 4, 
so P(Type Terror) = P(X >4 when X ~ B(12,)) 
=1-P(X<4) 


=1-(2)% 412 x Ox b4 PC, x (Dx + PC; x (xO) 
=: G 
= 0.1251... 
=0.13 (2s.f£.) 
(b) The hypotheses now become 
Hy: p=% 
Ay: p= $ 


P(Type II error) = P(accept Hy when H, is true) 

If H, is true, then p=4 and X ~ B(12,4) 

H, is accepted if x < 4, 

so P(Type Il error) = P(X < 4 when X ~ B(12,)) war 
= (2) +12 x@)? + PC, x GQ)? + PCG) 
=0.073 (2s.£.) 


rYPOTHESIS TESTS; OISCRETE” 


a rate of 0.5 per m?. In an attempt to reduce the numb 


(a) Stating your hypotheses clearly, test at the 10% level of significance whether or not the 
rate of occurrence of flaws using the new procedure has decreased. 


The new procedure actually produces flaws at a rate of 0.3 per m?, 


(b) Find the probability of making a Type II error using the test in part (a). (L) 


Solution 10.9 


Let X be the number of flaws in an 8 m? window. Then X ~ Po(A). 
(a) Hyd =4 
Hy: A <4 (the number of flaws has decreased) 
If Hy is true, then X ~ Po(4). 
Use a one-tailed test and consider the lower tail for the critical region. 
At the 10% level the value x = 1 will be in the critical region if P(X <1) < 10%, therefore 
reject Hy if P(X <1) < 10%, 
PX < 1) =P(X=0)+ P(X= 13) 
=e“(1 +4) 
=0.0915.., 
Since P(X <1) < 10%, reject Hy in favour of Hy. 


The rate of occurrence of flaws using the new procedure has decreased. 
(b) The hypotheses now become 

Hy: A=4 

AyA=2.4 

If H, is true, then X ~ Po(2.4). 


P(Type Il error) = P(accept Hy when H, is true) 
The critical region is x < 1, so you would accept Ay ifx > 1. 
P(Type Il error) = P(X > 1 when X ~ Po(2.4)) 
=1-e(142,4) 
= 0.6916 
=~ 69% 


A rancor " : 


Miscellaneous exercise 10c - Binomial and Poisson tests 


w 


4, Before I sat an examination, my teacher told me 


that I had a 60% chance of obtaining a grade A, 
but I thought I had a better chance than that. 

In preparation for the examination, we did seven 
tests each of the same standard as the ; 
examination. Assuming my teacher is right, find 
the probability that ] would get a grade A on 


(a) all 7 tests, 

(b) exactly 6 tests out of 7, 

{c) exactly S tests out of 7. 

In fact I got a grade A on 6 tests out of 7, State 
suitable null and alternative hypotheses and 
carry out a statistical test to determine whether 
or not there is evidence that my teacher is 
underestimating my chances of a grade A. (MEI) 


. Harry Hotspur is a footballer who likes to take 
penalty kicks. On past performance he reckons 
that on average he scores 7 times out of 10. 
‘Assume that Harry is correct, and consider the 
next 8 penalty kicks he takes. 

(a) Find the probability that he will score at 
least 6 times. : 

(b) Find the modal score and state its 
probability. : : 

(c) What further assumption have you made in 
calculating the probabilities in (a) and {b)? 

After a period of intense practice, Harry reckons 

that he has improved his penalty taking. 

(d) Write down suitable null and alternative 
hypotheses for testing the value of p, the 
probability that Harry scores from a penalty 
kick. 


He takes 15 penalty kicks and scores from 13 of 
them. 


(e) Carry out the hypothesis test, at the 10% 
level of significance, stating your conclusion 
clearly. 

(f) Harry takes a further set of 15 penalty 
kicks. Out of the total of 30 kicks he scores 
from 26. Without further calculation 
explain carefully whether this additional 


information strengthens Harry’s case or not. 
(MEI) 


. The manufacturers of a certain type of ; 
microwave oven claim that at least 95% of their 
ovens will not fail during the first two years of 
use. In order to test this claim, a Consumer 
Agency purchased a random sample of 15 ovens 
and ran them under similar conditions over a 
two-year period. It was found that 12 ovens had 
not failed during that period. 

Test the manufacturer’s claim using an exact 
binomial distribution. The significance level 
should be as close as possible to 5%. ; 
Explain why an exact 5% significance level is not 
possible. (C} 


4. The ABC School of Motoring claim that at least 


80% of their pupils pass the driving test first 
time. The XYZ School of Motoring suspect that 
more than 20% of ABC’s pupils fail first time. 
They test this suspicion by checking the results of 
a random sample of 25 former ABC pupils, 
finding out how many failed first time. 


{a} State suitable null and alternative 
hypotheses to be used in the test. 

(b) Identify the model that should be used for 
the distribution of the number of failures. 

(c) Find the smallest number of failures which 
would allow ABC’s claim to be rejected at 
the 5% level of significance. (NEAB) 


5. For most small birds, the ratio of males to 


females may be expected to be about 1: 1. In one 

ornithological study birds are trapped by setting 

fine-mesh nets. The trapped birds are counted 
and then released. The catch may be regarded as 

a random sample of the birds in the area. 

The ornithologists want to test whether the sex 

ratio of blackbirds is, in fact, 1:1. 

(a) Assuming that the sex ratio of blackbirds is 
1:1, find the probability that a random 
sample of 16 blackbirds contains 
(i) 12 males (ii) at least 12 males 
(iii) at least 12 of the same sex. 

(b) State the null and alternative hypotheses the 
ornithologists should use, clearly indicating 
why the alternative hypothesis takes the 
form it does. 


In one sample of 16 blackbirds there are 12 

males and 4 females. 

(c) Carry out a suitable test using these data at 
the 5% significance level, stating your _ 
conclusion clearly. Find the critical region 
for the test. : 

(d) Another ornithologist points out that, 
because female birds spend much time 
sitting on the nest, females are less likely to 
be caught than males. Explain how this 
would affect your conclusions. (MEI) 


6. Over many years it has been found that ata 
particular station 20% of trains arrive late. A 
consumer group wishes to test whether the ‘i 
percentage of trains arriving late has increase 
recently. It decides to observe 20 trains. If more 
than four of the trains arrive late it will claim 
that the percentage of trains arriving jate has 
increased. ; 

(a) In the case where the percentage of Fae 
arriving late has remained at 20 %, fin 


probability that the consumer group makes 
a Type lerror. 

{b) In the case where the percentage of 
arriving late has increased to 25%, 
probability that the consumer group m0: 
a Type Il error. 


trains 
find the 


akes 


{c) G) Comment on your answer to part (a). 
(ii) Suggest an improvement to the 
procedure used by the consumer group. 
(NEAB) 


7 A firm producing mugs has a quality contro! 


scheme in which a random sample of 10 mugs 
from each batch is inspected, For 50 such samples, 
the numbers of defective mugs are as follows: 


Number of 
defective mugs 0° 100020030 45 be 
Number of 
samples Se T3 TS D2 A 2 0. 


(a) Find the mean and standard deviation of the 
number of defective mugs per sample. 

(b) Show that a reasonable estimate for p, the 
probability that a mug is defective, is.0.2. 
Use this figure to calculate the probability 
that a randomly chosen sample will contain 
exactly 2 defective mugs, Comment on the 
agreement between this value and the 
observed data. 


The management is not satisfied with 20% of 
mugs being defective and introduces a new process 
to reduce the proportion of defective mugs. 


(c) A random sample of 20 mugs, produced by 
the new process, contains just one which is 
defective. Test, at the 5% level, whether it is 
reasonable to suppose that the proportion of 
defective mugs has been reduced, stating 
your null and alternative hypotheses clearly. 

(d) What would the conclusion have been if the 
management had chosen to conduct the test 
at the 10% level? (MEI) 


8. Ina certain country, 90% of letters are delivered 


the day after posting. 
A resident posts 8 letters on a certain day. 
Find the probability that: 


(a) all 8 letters are delivered the next day, 
(b) at least 6 letters are delivered the next day, 


(c) exactly half the letters are delivered the next 
day. 


It is later suspected that the service has 
deteriorated as a result of mechanisation. To test 
this, 17 letters are posted and it is found that 
only 13 of them arrive the next day. Let p denote 
the probability, after mechanisation, that a letter 
is delivered the next day. 


(d) Write down suitable null and alternative 
hypotheses for the value of p. Explain why 
the alternative hypothesis takes the form it 
does. 

(e) Carry out the hypothesis test, at the 5% level 
of significance, staring your results clearly. 

(f) Write down the critical region for the test, 
giving a reason for your choice. (MET) 


9. It is known that the number of defects in a 
one-metre length of steel pipe has mean 2.4. It 
has been suggested that a Poisson distribution 
would be a reasonable model for the number of 
defects in a randomly chosen one-metre length of 
this steel pipe. 


(a) State two assumptions that would need to 
be made for a Poisson distribution to be an 
appropriate model in this case. 

(b) Using this Poisson model, calculate the 
probability that in a randomly chosen 
one-metre length of steel pipe there are: 

{i) exactly 3 defects, 
(ii) more than 3 defects, 

(c) Determine the probability that there are 

exactly 6 defects in a randomly chosen two- 

metre length of the same type of steel pipe. 

It is believed that the manufacturing process 

may now be producing more defects than 

before. In a quality control experiment a 

one-metre length of the steel pipe is chosen 

and is found to have 7 defects. Test, at the 

5% level of significance, the hypothesis that 

the number of defects in this type of steel 

pipe has increased. State your hypotheses 

clearly. (O) 


(d 


10. (a) The number, X, of breakdowns per day of 


11. 


the lifts in a large block of flats has a 
Poisson distribution with mean 0.2. Find, to 
three decimal places, the probability that on 
a particular day 

(i) there will be at least one breakdown, 
(ii) there will be at most two breakdowns, 

(b) Find, to three decimal places, the probability 
that, during a 20-day period, there will be 
no lift breakdowns. 

(c) The maintenance contract for the lifts is 
given to a new company. With this company 
it is found that there are two breakdowns 
over a period of 30 days. Perform a 
significance test at the 5% level to decide 
whether or not the number of breakdowns 
has decreased, (L) 


The number, X, of emergency telephone calls to 
a gas board office in ¢ minutes at weekends is 
known to follow a Poisson distribution with 
mean ggt. Given that the telephone in that office 
is unmanned for 10 minutes, calculate, to two 
significant figures, the probability that there will 
be at least 2 emergency telephone calis to the 
office during that time. 

Find, to the nearest minute, the length of time 
that the telephone can be left unmanned for there 
to be a probability of 0.9 that no emergency 
telephone call is made to the office during the 
period the telephone is unmanned. 

During a week of very cold weather it was found 
that there had been 10 emergency telephone calls 
to the office in the first 12 hours of the weekend. 
Using tables, or otherwise, determine whether 
the increase in the average number of emergency 
telephone calls to that office is significant at the 
5% level. (L) 
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Mixed test 10A (Binomial) 


1. The random variable, R, can be modelled by a 
binomial distribution with parameters 7 = 10 and 
p, whose value is unknown, 
Find the critical region for the test of 


Ho: p =0.5 against Hy: p # 0.5 
at the 10% level of significance. (NEAB) 


2. A large college introduced a new procedure to 
try to ensure that staff arrived on time for the 
start of lectures. A recent survey by the students 
had suggested that in 15% of cases the staff 
arrived late for the start of a lecture. In the first 
week following the introduction of this new 
procedure a random sample of 35 lectures was 
taken and in only one case did the member of 
staff arrive late. 


(a) Stating your hypotheses clearly test, at the 
5% level of significance, whether or not 
there is evidence that the new procedure has 
been successful, 


Mixed test 10B (Poisson) 


1, The mean number of serious accidents at a 
motorway interchange is 2.1 per week. 


(a) State the probability distribution which may 
reasonably be used to model the weekly 
number of serious accidents at this 
motorway interchange, and give its 
parameter. 
(b) Use an appropriate distribution to determine 
the probability that he number of serious 
accidents is: 
(i) two or fewer in a randomly selected 
week: 
(ii) exactly one on a randomly selected day. 
{c) Given that there were 6 serious accidents 
during one wet winter week, test, at the 5% 
level of significance, the hypothesis that the 
accident rate is higher in wet weather. (O) 


2, (a) The number of bacterial colonies that 
develop in dishes of nutrient exposed to an 
infected environment has a Poisson 
distribution with mean 7.5. 

{i) Calculate the probability that, in one 
such dish, the number of bacterial 
colonies that develop will be greater 
than 10. 


. An enthusiastic gardener claimed that she could 


A student complained that this sample did not 
give a true picture of the effectiveness of the new 
procedure. 


(b) Explain briefly why the student’s claim 
might be justified and suggest how a more 
effective check on the new procedure could 
be made. (L) 


never work in the garden at the weekend because 

‘It always rains on Saturday and Sunday when 

T’m at home and it’s always fine on weekdays 

when I’m not! She noted the weather for the 

next month and recorded that, out of 10 wet 

days, five were either a Saturday or a Sunday. 

The gardener’s claim may be modelled by 

regarding her observation as a single sample 

from a B(10, p) distribution. Given that one i 
would expect 2 out of every 7 wet days to be j 
either a Saturday or a Sunday, the null i 
hypothesis, p = 4, may be tested against the 

alternative hypothesis, p > 3. Carry out a 

hypothesis test to test her claim at the 10% 

significance level. {C) 


(ii) Calculate the probability that, in two 
such dishes, the total number of 
bacterial colonies that develop will be 
between 10 and 20 inclusive. 

(b) Experiments were conducted to determine 
the effectiveness of an antibiotic spray in 
reducing the number of bacterial colonies 
that develop. 
In one experiment in which one dish was 
sprayed, the number of bacterial colonies 
that developed was 3. Stating suitable null 
and alternative hypotheses, determine 
whether or not this result provides 
significant evidence at the 5% level that the 
spray is effective. {NEAB) 


. Asingie observation is taken from a Poisson 


distribution with mean and used to test the 
hypothesis 4 = 6 against the alternative 
hypothesis > 6. 

The critical region is chosen to be x > 11. 


(a) At what significance level is the test carried 
out? 

(b) Find the probability of making a Type Il 
error if, in fact, z= 8.5. 


11 


Hypothesis testing (z-tests and t-tests) 


In this chapter you will 


@ be reminded about the language of hypothesis (significance) testing introduced in Chapter 10 
e be reminded about Type | and Type Il errors 


e learn how to perform the following hypothesis tests: 
Test 1: Testing uw, the mean 
la: of a normal distribution with known variance, any si, 
ormal d } , any size sample (z-test) 
1b: ofa distribution with known variance, large sample meal 
le: ofa distribution with unknown variance, large sample (z-test) 
1d: of normal distribution with unknown variance, small sample (t-test) 
Test 2:Testing p, the proportion of a binomial distribution, n large (z-test) 
Test 3:Testing 1, - May the difference between means of two normal distributions 
3a: when population variances are known (z-test) 
3b: when there is a known common population variance (z-test) 
3c: when the common population variance is unknown, 
~ large samples (z-test) 
— small samples (¢-test) 


Background knowledge: 


For se z-tests you will need to be familiar with 

— the normal distribution and the use o 

— the distribution of the sample mean ane HR een 
- the unbiased estimate for the population variance (see page 447) 

~ the normal approximation to the binomial distribution (see page 382) 

For the t-tests you will need to be familiar with 

~ the use of the t-distribution tables (see page 463) 


HYPOTHESIS TESTING 


I 
f you have worked through Chapter 10 you will be familiar with the terminology and 


methé : : : 
ds used to carry out hypothesis tests relating to discrete distributions. For those new to 


the topic, these are described a; ain in the followir on to 
> gain b i is ti i i 
i Sia g text, but this time in relati 
Continuous distributions. The ‘ 


4 normal distribution. 


following example illustrates the hypothesis test for the mean of 
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In the production of ice packs for use in cool boxes, a machine fills packs with liquid and the 
packs are then frozen. Since space is needed in the packs for the liquid to expand, it is 
important that they are not over-filled. The volume of liquid in the packs follows a normal 
distribution with mean 524 ml and standard deviation 3 ml. 


The machine breaks down and is repaired. In the next batch of production, there is a 
suspicion that the mean volume of liquid dispensed by the machine into the packs has 
increased and is greater than 524 ml. In order to investigate this, the supervisor takes a 
random sample of 50 packs and finds that the mean volume of liquid in these is $24.9 ml. 
Does this provide evidence that the machine is over-dispensing? 


The mean volume of the sample, 524.9 ml, is higher than the established mean of 524 ml. But 
is it high enough to say that the mean volume of all the packs filled by the machine has 
increased? Perhaps the mean is still 524 ml and this higher value has occurred just because of 
sampling variation. A hypothesis (or significance) test will enable a decision to be made that is 
backed by statistical theory, not just based on a suspicion. 


Let X be the volume, in millitres, of liquid dispensed into a pack after the machine has been 
repaired and let the mean of X be #, where 4 is unknown. Assuming that the standard 
deviation remains unchanged, X ~ N(u, 0”) with o = 3. 

The hypothesis is made that j is 524 ml, i.e. the mean has remained the same as it was prior 
to the repair. This is known as the null hypothesis, Hy and is written 


Hg w= 524 


Since it is suspected that the mean volume has increased, the alternative hypothesis, H,, is that 
the mean is greater than 524 ml. This is written 


Ay: uw > 524 


To carry out the test, the focus moves from X, the volume of liquid in a pack, to the 
distribution of X, the mean volume of a sample of 50 packs. In this test, X is known as the 
test statistic and its distribution is needed. The act 
distribution of X is known as the sampling fe ~ sae = 
distribution of means. \ % 


In Chapter 9 you saw that if X ~ N(u, 0’), 


2 
at oO : 
then, for'samples of size n, X ~ Nf 7} z siete + 
n : M 
The hypothesis test starts by assuming that the value stated in the null hypothesis is true, so 


w= S24, 


Since o = 3 and n= 50, 


= 3? a 

xX ~ [sea sop ie. X ~ N(524, 0.18). 
The sampling distribution of means, therefore, follows a normal distribution with mean 
524 ml and variance 0.18 ml?. The standard deviation is V0.18 ml. 
NOTE: This is sometimes left in the f eee 
: This is sometimes left in the form ,}—=—==—=. 
Vn Y50 
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Th i 
e result of the test depends on the whereabouts in the sampling distribution of the test value 


of 524.9 ml, the mean volume of the sam: 
mi, ple of 50 packs taken by th i 
need to find out whether 524.9 is close to 524 or fac en ons eer 


If it is close to 524 then it is likely to have come from a 
distribution with mean 524 ml and there would not be 


enough evidence to say that the mean volume has 
increased. 


Ifit is far away from 524, i.e. in the right-hand (upper 

tail) of the distribution, then it is unlikely to have come = ~— i 
from a distribution with a mean of 524 ml. The mean is x 524 $24.9 

likely to be higher than 524 ml. (close to 524) 


Note that the upper tail is being used because the 
supervisor suspects that there is an increase in pw. This 
type of test is called a one-tailed (upper tail) test. 


A decision needs to be taken about the cut off point, c, a = _— 
known as the critical value, which indicates the boundary * 524 524.9 
of the region where values of ¥ would be considered to be Mist away rom $24) 
too far away from 524 ml and therefore would be 

unlikely to occur. The region is known as the critical region or rejection region. 


The critical value and region are fixed using probabilities linked to the significance level of the 
test. In general, for an upper tail test at the a% level, the critical value c is fixed so that 
P(X > c)=a% and the critical (rejection) region is ¥ > c. 


o—~ 
\ 
\ 
h, «% shaded 
A x 


x 7 

H+ > 

critical region, X > c 

{rejection) 
The hypothesis test involves finding whether or not the sample value, x, lies in the critical 
region of the sampling distribution of means, X. 


In this example, if x lies in the critical region, then a decision is taken that it is too far awa 

from $24 ml to have come from a distribution with this mean. In statistical language, you 7 

poe reject the null hypothesis, Hy (that the mean is 524 ml), in favour of the alternative 
ypothesis, H, (that the mean is greater than 524 ml) 


fz ones ie : : 
: & does not lie in the critical region, there is not have enough evidence to reject Hy, so Hy is 
ccepted. In this example, % < c is known as the acceptance region. , ; 


For a signifi i ies i 
: a significance level of a%, if the sample mean lies in the critical (or rejection) region, then 
€ result is said to be significant at the a% level. , 


Note that if a result is signifi 
is significant at, say, the 1% level, then it is automatically signifi 
any level greater than 1%, for example 5% or 10%. ‘ ea 


ay that the supervisor chooses a significance level of 5%. She will then reject Hy if the test 


value (ie. the mean volume i 
alue (i.e. of the sample of 50 i i 
distribution of sample means. iat Se eee ee 


Since this distribution is normal, instead of finding c, the critical x value, it is possible to work 
in standardised values and find the z-value that gives 5% 
in the upper tail. Using standard normal tables (page 649), 
if P(Z > z) =0.05 

then P(Z<z)=1-0.05 =0.95 


ie. (z) = 0.95 
_ 
z= 01(0.95) z: 0 1.645 
= 1.645 critica! region 


So z-values that are greater than 1.645 lie in the upper tail 5% of the distribution. 
This enables a staternent to be made, known as the rejection criterion, which tells you when to 
reject the null hypothesis: 
Reject Hy if z> 1.645, where z is the standardised value of the mean of the sample of 50 
packs, 

X—-m X-524 
Z2= = 

al\n  3/N50 
Note that to avoid being influenced by sample readings, it is important that the rejection 
criterion is decided upon before any sample values are taken. 


Le. 


When the sample was taken, it was found that ¥ = 524.9, 


524.9 - 524 5% 
a aeey 
3 N50 NN 
= 212 (2:dip.) z: 0 1.645 
test value 
z=2.12 


The result of the test is now stated in statistical terms and then related to the context of the 
test, as follows: 

Since z> 1.645, Hy is rejected in favour of H,. The supervisor would conclude that the mean 
volume of liquid being dispensed by the machine is not 524 ml, but has increased, she would 
be wise therefore to stop production so that the setting on the machine could be adjusted. 


Note that the critical x-value, c, can be found by de-standardising the critical 
z-value of 1.645, where 


c- 524 ~ 

= 1.645 
3/V50 x 

3 
= 5244 1.645 x= ae 
V50 
= $24.7 2, 
x: 524 824.7 XK 
Since the test value of 524.9 is greater than 524.7, apo 


it lies in the critical region, confirming the result 
obtained above. 


If you want even more information, you can find out exactly where the sample mean lies in 
the distribution of X. Note that this is the method used in Chapter 10 for discrete variables. 


es 32 
X ~ N/524,— 
50 


, boundary for 5% 


so P(X > 524.9) -?(2 I 
3/50 
=P(Z>2.1213 ...) 
=1-0.9831 wie 
= 0.0169 
= 1.7% 
This probability is less than 5%, 
the left of the sample value of 52 
method also tells you that the tes 
significance above 1.7%. 


17% 


: — 
$24 524.9 


>I 


implying that the boundary for the critical region must lie to 
4.9 and confirming that 524.9 lies in the critical region. This 
t value of 524.9 will lie in the critical region for any level of 


This peaiy method can be used, if preferred, in the hypothesis test to find whether the 
sample value lies in the rejection region. In this example, for a 5% level of significance, the 
rejection criterion would be to reject H, if P(X > %) < 0.05, where & is the sample Heat 


ONE-TAILED AND TWO-TAILED TESTS 


Say that the null hypothesis is = Ho. 
In a one-tailed test, the alternative hypothesis H, looks for an increase or a decrease in Ma 


for an increase, Hy is > “19 and the critical region is in the upper tail, 


a / 
critical region 


T 
Ho 


for a decrease, H, is u <jtg and the critical region is in the lower tail. 


PAN 


Ho 


critical region 


In a two-tailed test, the alternative hypothesis H, 


looks for a change i i ifyi 
tai ¢ ge in ze without specif 
whether it is an increase or a decrease and A, ‘ fo 


Is 4 # to. The critical region is in two parts: 


Pa. 
— is Paes 


critical region critical region 


CRITICAL z-VALUES 


Critical values depend on the significance level and also whether the test is a e spleens 
The method of working in standardised values is widely used for tests aks ving sporti 
distribution because the critical z-values can be found easily from oan ar baie As : 
described on page 529. Sometimes the most commonly needed values a aries or 
critical value table. One such table is shown below and it is also a an Pe aM a 
bottom of page 649. It gives the z-values for various values of p, where p = 2 


p 0.75. 0.90, 0.95. 0.975 0.99 0.995 0.9975 0.999 0.9995. 


z 0.674 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291 


For example, for a one-tailed test, at 1% level: you want to find z such that ®(z) = 0.99, so 


= 0.99, S : 
Bae table, z= 2.326. Therefore the upper tail critical value is 2.326. By symmetry, the 


lower tail critical value is -2.326. 


Z~ NO, 1) 


Z~ NO, 1) 


z: 0 2.326 z 2.326 0 


For a two-tailed test, at the 1% level: the 1% in the tails is split evenly between the upper and 
lower tails with 0.5% in each. There are two critical values. 
To find the upper tail value, you need to find z such that ®(z) = 0.995, so look up p = 0.995. 


From the table z= 2.576. : ai 
So the cea tail critical value is 2.576 and the lower tail value (by symmetry) is —2.576. 


Critical values: 


SUMMARY OF CRITICAL VALUES AND REJECTION CRITERIA 


The summary below shows the critical z-values and re 


jection criteria for the most conponly 
used levels of significance: 10%, 5% and 1%. 


One-tailed test One-tailed test Two-tailed test 


(lower tail) (upper tail) 
Aoi n= 14 Hy t= Ky Ay a= uy 
Ay ie< pig yi u> iy Hy: + jeg 


10% significance level Reject Hig if z<-1.282.- Reject Hy ifz> 1.282: Reject Hg iz <-1.645 
or z> 1.645 : 
(written | z|> 1.645) 
5% significance level Reject Ho if z<-1.645 Reject Hg if z> 1.645: Reject Hy if z< 1,96 
or z > 1.96 
(written | z]> 1.96) 


1% significance level Reject Ab if =< -2.326 Reject Ay ifz>2.326.° Reject Hy ifs < “2.576 


or 2 >:2:576 
(written | z] > 2.576) 


am 


STAGES IN THE HYPOTHESIS TEST 


When carrying out a hypothesis test, it is useful to work through the following stages. This is 


essentially the same procedure as in the tests for parameters of discrete distributions described 
in Chapter 10. 


1. State the variable being considered. 
2. State the null hypothesis H, and the alternative hypothesis H,. 


Remember that if you are looking for 

a decrease, then Ay:...<.... (one-tailed test, lower tail) 

an increase, then Hi, >... (one-tailed test, upper tail) 

a change, then Aye... # 0. (two-tailed test, upper and lower tails) 


Consider the distribution of the test Statistic, assuming that the null hypothesis is true. 


If you are testing a sample mean, then the test statistic is X, and the sampling distribution 
of means is considered. 


State the type of test (ie. whether it is one-tailed or two-tailed) and decide the significance 
level of the test. : 


Decide on your rejection erion, remembering that you will reject Ho if the test value 
lies in the critical (or rejection) region fixed by the significance level, 


_ Now consider the value of the test statistic 


6. Perform any calculations necessary to find out whether the test value is in the critical 
“region. 


Does this provide evidence, at the 5% significance level, that trainees from this county did not 


7. Make your conclusion in statistical terns: perform well as expected? 
= Ifthe test value is in the critical region, reject Hg in favour of H,. 
= Ifthe test value is notin the critical region, do not reject Ho. Solution 11.1 | 
Then relate your conclusion to the situation being tested. The riage of the hypothesis test are shown in the margin and additional comments are given 
in italics. 


There are several hypothesis tests involving continuous distributions and some of these are 


illustrated in the following text. Let X be the mark of a trainee from the particular county and let the population 


mean mark be yt. 


Assuming that the standard deviation has not changed, then X ~ N(u, 0”) with 
HYPOTHESIS TEST 1: TESTING « (THE MEAN OF A POPULATION) o=6. 


Hy: = 70 (The trainees have performed as expected) 


Consid ulation X with unknown mean p and variance o*. ; 
ela DOP B HA, :4< 70 (The trainees have not performed as well as expected) 


A value for 1, call it 29, is specified in the hypotheses, for example 
The test is carried out based on the value of the sample mean, ®. The test 


statistic is X and you need to consider the sampling distribution of means, 
2 


Ay: =o 

Ay: < fg (or > Mg OF W# M49) 

To test the hypotheses, take a sample of size m from the population and calculate the sample 
mean, X. The test statistic is X, and the sampling distribution of means is considered. 


For samples of size n, X ~ fu _ with o = 6 and n=25, 
n 


There are now several cases that may occur, depending on whether the population is normal You now use the value of u given by the null hypothesis. 


or not, whether the sample size is large or small and whether the population variance is ; _ 62 é 
known or not. If Hp is true, then «= 70, so K ~ n{r0 a, ie. X ~ N(70, 1.44), { 
Note that the standard deviation is ¥1.44 = 1,2. 
Test la: Testing « when the population X is normal and the variance The standard deviation is sometimes left in its uncalculated form: 


o” is known (any size sample) 


2 : 
: Fae re , ee ee oo 6 | 
Since the population is normally distributed, X ~ N(u, 07). The sampling distribution of —=— = | 
means, X is also normal for all sample sizes, with mean fig (as specified in the null hypothesis). a Nn N25 | 
When testing the mean of a normal population X with known variance o? for samples of ‘ : mothe Use a one-tailed (lower tail) test at the 5% level. 
; Qo 6 AP OTE Je OF ihe (est. ¥ : 
size 1, the test statistic is Note that the test is one-tailed (lower tail) since you are looking for a decrease 


ae = | o*\ in wl. 
X, where X ~ Nj sty, i} 
#1 


You need to find out whether the sample mean of 67.3 (known as the test value) I 
lies in the critical region. To state your rejection criterion, find the critical z- : 


value for the 5% lower tail. This is -1.645 (see page 513). 


Tn standardised form, the test statistic is 


Xm yp 


ko 
Z, x2 mnie Where Z~ N(Q, 1). . : X— x ~ 
o/\n es Reject Hy if z<-1.645 where z= HS Ho. B70 : 
olN\n  6/N25 
& Perform the The test value is % = 67.3, so y, 
Example 11,1 67.3-70 5% Of 

Each year trainees throughout the country sit a test. Over a period of time it has been : 7 6/N25 mattiE ff : 

established that the marks can be modelled by a normal distribution with mean 70 and =-2.25 Zz JA -1.645 0 : 
-2.25 


standard deviation 6. d Ais ' 
State the conclusion statistically (either reject Hy, or do not reject Hy) and then 


relate it to the context of the question. 


This year it was thought that trainees from a particular county did not perform as well as 

expected. The marks of a random sample of 25 trainees from the county were scrutinised and 

it was found that their mean mark was 67.3. | 
i 


Since z < -1.645, Hy is rejected in favour of Hj. 


There is evidence, at the 5% level, that the trainees in this area have not performed as well as 
expected. 


NOTE 1: 
To find the critical region, calculate the critical ¥ value, c, as follows: 


=-1.645 f \ 
6/N25 
645 f 
c=70~1. x— 
V25 = 
c= 68.026 x | 68.026 70 


So the critical region is ¥ < 68.026. This means that any o23 


test value less than 68.026 would result in the null hypothesis being rejected. 


NOTE 2: 4 2 : 

If you prefer to use the probability method to decide whether & lies in the critical region, then, 
since the significance level is 5%, the rejection criterion would be to reject Hy if 

P(X < 67.3) < 0.05. 


& 3y=Plz 67.30 —70 
Now P(X < 67.3)= < SNS Syquis 
= P(Z < -2.25) 
= 0.013 13% 41 <a 
=139 wot 
mais % 67.3 70 
Since P(X < 67.3) < 0.05, reject Hy (as before). Zz -2.25 0 


This method also tells you that Hy, would be rejected at any significance level above 1.3%. 


Solution 11.2 


Example 11.2 - 


A sample of size 16 is taken from the distribution of X ~ N(u, 37) and a hypothesis test is 
carried out at the 1% level of significance. On the basis of the value of the sample mean X, the 
null hypothesis 4 = 100 is rejected in favour of the alternative hypothesis > 100. 


What can be said about the value of %? 


In this question you are being asked to find the critical region in terms of X. 
It is given that X ~ N(u, 0”) with o = 3. 
The hypotheses are Hy:= 100 

Hy :> 100 


Considering the sampling distribution of means for samples of size 2, 


“2 
X-Nfn | with o=3 and n= 16. 


1, — 
n 
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2 
If Hp is true, then = 100, so X ~ {100 a 


295 cat: “GO: 3 
Note that the standard deviation is — = —— 


Vn Vi6 


Performing a one-tailed (upper tail) test at the 1% level, you are told that Hy is rejected. 


(= 0.75). 


This means that the sample mean, X must lie in the critical region, i.e. X must be greater than 
the critical value, c. 


Working first in standardised values, the critical z-value of 
that gives 1% in the upper tail is z = 2.576 (see page 649). 
c—100 
De-standardising to give c:. t———=2.576 1% 
3/N16 : 
3 Zz: 0) 2.576 
c=100+2.576x——= 100 ¢ = 101.932 
Vie -—> 
c= 101.932 reject Hy 


So the critical region is ¥ > 101.932. 
Since the null hypothesis is rejected, the sample mean, x, is greater than 101.932. 


Test 1b: Testing ~ when the population X is not normal, the variance 
o? is known and the sample size n is large 


Since the population is not normal, you cannot say that the distribution of X is normal for all 
sample sizes. If the sample size 1 is large, however, you can apply the central limit theorem 
(see page 442). This states that for large samples taken from a non-normal population, the 
sampling distribution of means X is approximately normal, whatever the distribution of the 
parent population, . 


When testing the mean of a non-normal population X with known variance o°, provided that 
the sample size 17 is large, 


the test statistic is X, where X is approximately normal, X ~ | 


in standardised form, 


the test statistic is Z » where Z ~ N(0, 1). 


Example 11.3 


The management of a large hospital states that the mean age of its patients is 45 years. 
Records of a random sample of 100 patients give a mean age of 48.4 years. Using a 
population standard deviation of 18 years, test at the 5% significance level whether there is 
evidence that the management’s statement is incorrect. State clearly your null and alternative 
hypotheses. (C) 
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Solution 11.3 ; Test 1c: Testing the mean # Of a population X where the variance o2 


the Let X be the age, in years, of a patient and let the population mean age be yp. is unknown and the sample size n is large 
The population standard deviation o = 18. 


Lb 


The variance of the population, 0, is unknown, but, as you saw in on page 447, an unbiased 


2. State H, Hy: = 45 (The management’s claim is correct) estimate, 6? can be used instead, 
and Ay, H,: #45 (The management’s claim is incorrect) 
where 6? = xs? isis the s 
: = id (s? is the si 
3. Stare the You are performing a test based on a sample mean, X, so you need to consider at 
i the sampling distribution of means, X. Alternative formats: 
ee The sample size is 100. Since 7 is large, by the central limit theorem, Sore: 5 eee , (Bx)? 
SS Oe X is approximately normal, so 2 tot IBY os n~1 XQx-k)° or 6? = n-1 ux hae 
; s.d. = vn = Yio0 = 
_ o : eee gives ae 
X~ Niu, 7] with o = 18 and n= 100. Ideally the population distribution should be normal, but if it is not, then the central limit 
n theorem can be applied, since the sample size is large. 
If Hy is true, then w= 45, : When testing the mean of a population X with unknown variance o?, provided the sample size 
182 nis large, 
so X ~ N(45,——~]. ; Sos a 
100 % 45 the test statistic is K where X ~ ] | 
4, State the level Use a two-tailed test, at the 5% level. F i 
of the test : . be In standardised form, 
a a The test is two-tailed since you are looking for a change in w, not specifically an X~ Mo 
increasé GF a-decredse. the test statistic Z =~", where Z ~ N(O, 1). 


O;NH i 
The critical z-values for a two-tailed test at the 5% level are £1.96 (see page 649). | 
i 


Remember that the S% is shared evenly between the tails. Example 11.4 


aia Reject Hy if z< —1.96 or z> 1.96, ie. if |z|> 1.96, The packaging on an electric light bulb states that the average length of life of bulbs is 


R-p *-4S 1000 hours. A consumer association thinks that this is an overestimate and tests a random 
where z= ae = Tsii00 sample of 64 bulbs, seeing the life x hours, of each bulb. 
: The results are summarised as follows: : 
G. Perform the The test value is % = 48.4, Py i 
fae yee Ex =63910.4, Ex? = 63 824 061, | 
calculation. so £= aenaOO (a) Calculate the sample mean, %. | 
18/V100 2.5% aad (b) Calculate an unbiased estimate for the standard deviation of the length of life of all light | 
= 1.888... 7 bulbs of this type. | 
Zz: 1,96 0A 1.96 F se wnat eka 
- : : 4 sect HL hea, {c) Is there evidence, at the 10% significance level, that the statement on the packaging is 
ae ieee Since |z|< 1.96, do not reject Hp. overestimating the length of life of this type of light bulb? 
oo There is not sufficient evidence, at the 5% level of significance, to reject the | 
management’s claim that the mean age is 45 years. Solution 11.4 : 
ieaeesemenimeaeieenieeeeeeteneneaieeeedeeeoeeeememneeeeneaememeneeememeenememaeenade enn >» 63 910.4 
(a) gaa _ 908.6 hours ; 
n 64 : 
1 Ex)? 
(b) goer‘ a 
n-1 n : 
1 63 910.42 
=—> 163 824 061 -——___. : 
63 ] 
=49.77 ... 
6=V49.77 ... 


= 7.055 (3 dp.) 


(c) Perform the hypothesis test as follows: 


Let X be the lifetime, in hours, of a light bulb. 
Let the population mean be y and the population standard deviation be o. 


Hy:= 1000 (The statement on the packaging is correct) 
H,:< 1000 (The statement on the packaging is overestimating the length of 
life) 


For samples of size 2, where x is large, by the central limit theorem and using & 
for o, 


oe 


X~ n{u a with 7 = 64 and 6=7.05S. 
n 


= 7.0557 
If Hy is true, “= 1000, so X ~ N} 1000, H 


64 


Use a one-tailed test, at the 10% level. 


The critical z-value for a one-tailed 10% test (lower tail) is -1.282. (see page 649). 


your re} - as 
suerion: Reject Ho if z< —1.282, where z= ici ee 

Wee am aNn  7.055/No4 
6, Perform the From the sample, % = 998.6. 
required . 998.6 — 1000 
valculation. ep "7055/64 10% 

=-1.587... 
Zz JA 1.282 0 
test value = ~1.587 

7, Make your Since z < -1.282, reject Hy (the mean is 1000 hours) in favour of H, (the mean 
conclusion, is less than 1000 hours). 


There is evidence, at the 10% level, that the statement on the packaging 
overestimates the length of life of this type of bulb. 


TYPE | AND TYPE Il ERRORS 


When you make your decision about whether or not to reject Hy there are two types of error 
that could be made. These were described in Chapter 10 (page 493) and are called Type I and 
Type II errors: 


~ a Type I error is made when you wrongly reject a true hypothesis, 
— a Type Il error is made when you wrongly accept a false hypothesis. 


These can be summarised in a table: 


Test decision 
Accept Ho Reject Hy 
‘Actual He, is true ¥ (correct) Type 'l error 
situation: Ho is false Type Il error v {correct} 


A Type L error is made if H, 
This is written 


P(Type f error) = P(H, is rejected | H,is true}, ¢ 


If the significance level is a% then the probability of rejecting H, 


1 : ae r ? ; 8 a%, so the significance 
evel of the test and the probability of making a Type [error are both the same. 


*, PCType Lerror) = a% 


A Type Il error is made if Hy is accepted when Hp is false. 
This is written 


P(Type UL error) = P(Hp is accepted | H, is false). 


To calculate the probability of a Type II error, a particular value must be specified in the 
alternative hypothesis H,. 


So P(Type Il error) = P(H, is accepted | H, is true) 


POWER OF A TEST 


The power of a test = Plreject H, when H, is true} 
= 1~—P(Type Hl error) 


Example 11.5 


A random variable has a normal distribution with mean “and standard deviation 3. 


The null hypothesis j= 20 is to be tested against the alternative hypothesis 4 > 20 using a 


random sample of size 25. It is decided that the null hypothesis will be rejected if the sample 
mean is greater than 21.4. 


(a) Calculate the probability of making a Type I error. 
(b) Calculate the probability of making a Type Il error, when in fact w= 21. 


Solution 11.5 
(a) You are given that X ~ Niu, 32) 


and Ag: w= 20 
Ay: w> 20 

s 3? 

For samples of size 25, X ~ Nu -) with n= 25. 
n 


2 
If the null hypothesis is true, « = 20 and X ~ nfo | 


P(Type I error) = P(Hy is rejected when H, is true) 
= P(X> Sopher = 20) Distribution of X when = 20. 
=P/Z>— 
3/N25 | 
= P(Z > 2.333) 
= 1- (2.333) 
=1-0.9902 “ ce 
= 0.0098 x: 20 24 
=1% Zz: 0 2,333 


The probability of making a Type I error = 1%. 
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NOTE: This gives the significance level of the test, so if values of the sample mean greater 
than 21.4 are rejected, the significance level of the test is 1%. 


(b) If, in fact, 4 = 21, the hypotheses become 
Ho: w= 20 
Hy: w=21 


P(Type II error) = P(accept Hj when Hy is false) 
= P(accept Hy when H, is true) 


You are given that H, is rejected if X > 21.4, so Hy will be accepted if x < 21.4. 


3? 
If H, is true, then = 21 and X ~ nfeu ra 


S Distribution of X when x = 21. 
so P(Type II error) = P(X < 21.4 when w= 21) 
[2 21.4- | 


= 3/N25 ; 
= P(Z < 0.667) 79% 
= 0(0.667) 
= 0.7477 
275% 


The probability of making a Type II error is 75%. 


21 214 
0 0.667 


Exercise lla  z-tests for a normal population or large sample size 


(Tests la, 1b and 1c) 


1. For each of the following, X follows a normal 
distribution with unknown mean yt and known 
standard deviation a, A random sample of size x 
is taken from the population of X and the sample 
mean, X, is calculated. eet 
Test the hypotheses stated, at the significance 
level indicated. 


machine into the cans has decreased. He takes a 
random sample of 50 cans and finds that the 
mean volume of liquid in these cans is 334.6 ml. 
Does this confirm his suspicion? Perform a 
significance test at the 5% level and assume that 
the standard deviation remains unchanged. 


3. Ina significance (hypothesis) test of the mean of 
Level of a population, a null hypot hesis pe = 103.5 is 
tested against an alternative hypothesis 

#< 103.5, where w is the mean of a normally 
distributed variable with known variance. 

A sample is taken from the population and a 


nt x a. Hypotheses significance 


(a) 30° 15.203 Aye 215.8) Hy ee 15.8 5% 


(Dy Ol 22 ADT Byte 26.35: Hy A> 26.3 2 2, 9% standardised normal test statistic z= -1.35 is 
4.2. Hy = 123.5, Hy p123.5 1% éalculated. eid. 
ia ‘ep is 0.18 i : 4.40, ihe 4<4.40 1% What conclusion, at the 5% level of significance, 
: . ot = 4.40, °° Aye . 


can be reached about the mean of the 
population? 


JA hine fills cans with soft drinks so that their 

: coiitents have a nominal volume of 330 ml. Over 4, A machine packs flour into bags. A fondant 
a period of time it has been established that the sample of 11 filled bags was taken and the 

volume of liquid in the cans follows a normal masses of the bags, to the nearest 0.1 g were 


distribution with mean 335 ml and standard 1506.8, 1506.6, 1506.7, 1507.2, 1506.9, 


deviation 3 ml. A setting on the machine is Ai 
aleeted: following which the operator suspects 1506.8, 1506.6, 1507.0, 1507.5, 1506.3, 150 


that the mean volume of liquid discharged by the 


. Arandom sample of 75 eleven-year-olds 


. Cassette tapes manufactured by a particular firm 


Filled bags are supposed to have a mass of 
1506.5 g. Assuming that the mass of a bag has 
normal distribution with variance 0.16 g’, test 
whether the sample provides significant evidence, 
at the 5% level, that the machine produces 
overweight bags. (C) 


+ A variable with known variance 32 is thought to 


have a mean of 55. A random sample of 81 
observations of the variable has a mean of $6.2. 
Does this provide evidence at the 10% level of 

significance that the mean is not 55? 


10. 
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To test this claim the mean mass of a random 
sample of 50 components is calculated and a 
significance test at the 5% level carried out. On 
the basis of the test, the claim is accepted. 
Between what values did the mean mass of the 
50 components in the sample lie? 


For each of the following distributions, X, a 
random sample of size 7 is taken and the values 
of Ex, Lx? or Xx —)? summarised as shown. 
Test the hypotheses stated at the significance 
level indicated, 


Explain what part the central limit theorem has 
played in your calculations, 


performed a simple task and the time taken, 
# minutes, noted for each. The results were 
summarised as follows: 


2t=1215, Le=21 708, 


Test, at the 1% level, whether there is evidence 
that the mean time taken to perform the task is 
greater than 15 minutes. 


are such that the playing time of the tapes can be 
modelled by a normal distribution with standard 
deviation 1.8 minutes, The tapes are advertised 
as having a playing time of 90 minutes, but the 
manufacturer claims that they actually have a 
mean playing time of 92 minutes. An investigator 
selected 36 tapes at random and checked the 
playing times. He calculated the mean playing 
time of the tapes in the sample and, on the basis 
of the value obtained, he rejected the 
manufacturer’s claim at the 5% level, saying that 
the mean time was less than 92 minutes, 


(a) What can be said about the value of the 
sample mean for this decision to be taken? 


The mean playing time of the firm’s cassettes is 
in fact 90.8 minutes. 


(b) Find the probability of making a Type 
error. 

{c) Find the probability that a cassette tape, 
picked at random, has a playing time less 
than the stated value on the pack of 
90 minutes. 


. A sample of 40 observations from a normal 


distribution X gave Ex =24, Ix? = 596, 
Performing a two-tailed test at the 5% level, test 
whether the mean of the distribution is zero. 


. The masses of components produced at a 


particular workshop are normally distributed 
with standard deviation 0.8 g. It is claimed that 
the mean mass is 6 g. 


Level of 
me Se Bak Lee)? Hypotheses : ‘significance 

(a) 65. 6500 650 842.4 Hy =99.2, 5% 
Aye 99.2 

(b) 65 6500. 650 842.4 Hy @=99.2, 9% 
Ay: jn > 99,2 

(ec) 806824 2508.8) Hos y= 86,2; 10% 
Ayre < 86:2 

(dy 100: 685:°°4728.25 Hg: = 7; 1% 

Ay at 7 


11. 


12. 


13. 


A large random sample was taken from a 
population with mean yz and known variance. 
The null hypothesis « = 52 was tested against the 
alternative hypothesis u # 52 at the 4% 
significance level. The calculated value of the 
standardised test statistic was 2.19, 


(a) Carry out a significance test for #t based on 
this result, stating your conclusion clearly, 

(b) State the probability of making a Type I 
error. 


A sample of size 15 is taken from the distribution 
of X where X ~ N(«, 4). If the sample mean is 
greater than 10.72, the null hypothesis = 10 is 
rejected in favour of the alternative hypothesis 
> 10. 


(a) Find the probability of making a Type I 
error, 

(b) Find the probability of making a Type II 
error if w= 10.9. 


An IQ test is developed such that the mean 
quotient is 100 and standard deviation is 12. It is 
given to a random sample of 50 children in one 
area. The average mark was 105. Does this 
provide evidence, at the 5% level, that children 
from this area are generally more intelligent? 


{b) Given that the actual value of x is 385, find 
the probability of making a Type Il error. 
Find the range of values of x for which the 
probability of making a Type II error is less 
than 0.025. 

The test is carried out, independently, on 
two different occasions. Find the probability 
that at least one Type Terror is made.  (C) 


14, Boxes of a certain breakfast cereal have contents 
whose masses are normally distributed with 
mean y g and standard deviation 15 g. A test of (c) 
the null hypothesis 4 = 375 against the 
alternative hypothesis 4 > 375 is carried out at 
the 24% significance level using a random (d) 
sample of 16 boxes. 


(a) Show that the alternative hypothesis is 
accepted when ¥ > 382.35, where & g is the 
sample mean mass. 


Test 1d: Testing the mean « when the population X is normal but the 
variance o? is unknown and the sample size n is small 


In this case, the population is normal, so X ~ N(u, 0’). Since o? is unknown, 6? is used instead 
(as in Test 1c on page 519). 


Consider the distribution of the sample mean X. When the sample size is small, X does not 
follow a normal distribution. As you saw in Chapter 9 (page 462), the standardised statistic is 
called T and it follows a ¢-distribution with (2 — 1) degrees of freedom. 


When testing the mean of a normal population X with unknown variance o7, when the sample 


size 1 is small, 


the test statistic is T where T 


When finding the critical t-values, ¢-distribution tables are needed and these are printed on 
page 650. You may need to remind yourself how to use them by reading again the notes on 


page 464. 


Example 11.6 


Five readings of the resistance X, in ohms, of a piece of wire gave the following results: 


1.51, 1.49, 1.54, 1.52, 1.54 
These are summarised by Lx = 7.6, Ux” = 11.5538. 


If the wire is pure, the resistance is 1.50 ohms. If the wire is impure, its resistance is higher 
than 1.50 ohms. Assuming that the resistance can be modelled by a normal variable with 


mean #, and standard deviation a, calculate 
(a} the sample mean, x, 
(b) an unbiased estimate of o, 


Is there evidence, at the 5% level of significance, that the wire is impure? 


Solution 11.6 


Let X be the resistance, in dhms, of the wire. 
Let the population mean be yw, and the population standard deviation be o. 


Then X ~ N(u, 0”). 


1. Define che 


variable. 


Ix 7.6 
(a) R= == 1.52 ohms 
n 5 
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ane fl n 

1 7.6 
n3[ttssas 2° 
= 0.000 45 


so 6 =0.0212 (3 s.f.) 


(c) Hy: 4 = 1.50 (the wire is pure silver) 
Hy: w> 1.50 (the wire is impure) 


If Hp is true, w=1 i i 2j 
A = 1.50. Since n is small and o? is unknown, the test statistic is T, 


distribution of 


the test statistic : _X-1.50 
Dative ue where ae and T ~ #7 - 1), 
‘ X-1.50 
1.e, Poss. dT~ 
0.0212/Vs and T ~ ¢(4), 


Use a one-tailed test (upper tail) at the 5% level. 


From the tables on page 650, the 
or 0.75 
critical value for t is found eon , | at eee ize 


row v = 4, p = 0.95 giving 2.132. vel 1.000 3.078 6.314 | 12.71 
: ; 2 . 
Reject " if the test value of t is 3 : on een ee 
greater than 2.132. : ; - a 
4 0.741 1,533 2.132] | 2.776 
ithe From the sample, % = 1.52. 
152-1. 
“o t= eel $0 9.109 en, 
0.0212/¥5 
Since ¢ < 2.132, Hy is not rejected, % 


There is no i indi 
t enough evidence, at the 5% level, to indicate that the wire is impure. 


Example 11.7 


A machine is supposed to i i 
produce paper with a mean thickness of 0.05 mim. Eigh 
a . t 
edge a aa ite a a mean of 0.047 mm with a standard detanon ot aire 
ming that the thickness of the paper produced by thi ine i istrit ; 
C 'y the machine is normally d 
test at the 1% level whether the output from the machine is different eee 


Solution 11.7 
tL 


Let X be the thickness, in millimetres, of paper produced by the machine. 


Let th lati i iati 
ma s a ayes mean be w and the population standard deviation beo. 
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4, State the level 
of the test. 


5. Decicde on 


6, Perforrn the 


conclusion. 


: : re , 
i i where 
Since o? is unknown, the unbiased estimate 67 is used, 


ea ds x s* (where s? is the sample variance). 
n-1 
e ee 0.002? 
7 
=4.57 ... x 10% 


6 =0.002 14 (3 s.f.) 


Hy: «= 0.05 (the thickness is as expected) 
H,: 4 0.05 (the thickness is different from that expected) 


If Ho is true, w= 0.05. ne ¢ 
Since # is small and o? is unknown, the test statistic is T where 


X-0.05 
op SE and T ~ «(n-1) 
6[Nn 
gO gag Tey, 
0.00214/V8 


Use a two-tailed test at the 1% level. 


The critical value for p 0.75 0.90 0.95 | 0.975 0.99 0.995 
t is found from row 
v=7, p=0,995 
(because you want 
0.5% in the each tail) 
giving +3.499, 


Reject Hp if t< -3.499 or t > 3.499 ie. if |£|> 3.499. 


v-1]| 1.000 3.078 6.314 | 12.71 31.82 63.66 


2] 0.816 1.886 2.920 | 4.303 6.965 9.925 


749.711 1415 4.895 | 2.365 2.998 


T ~ (7) 


From the sample, % = 0.047. 
0.047 - 0.05 


eS 9-96), 
a2 : 
0.002 14/V8 


0.5% 0.5% 


i > 3.499, Hy is rejected, Q 
eae — T: 3.499 0 3.499 


There is evidence, at the 1% level, that the output from the machine is different from that 


expected. 


Exercise 11b tests for a normal populatic 


(Test 1d) 


1. For each of the following, X follows a normal 
distribution with unknown mean #and known 
standard deviation o. A random sample of size 2 
is taken from the population of X and the 
sample mean, %, is calculated, 

Test the hypotheses stated, at the significance 
level indicated. 


n Ex DOr RP De? Hypotheses.’ Level 


(a) 12 298.8 7942.42. Hy pa24d, 5% 


Hye >244 
(b).17.° 605.2 23016.92.. Har e= 40, 5%: 
Fy: #40 
(ce). 6 9034.8 50.8 Fy: #= 1503, 10%, 
Ay: # 1503 


(d)-10° 1298 97.6 A: w= 133.0, 1% 


Ay: < 133.0 


2. An athlete finds that her times for running a race 
are normally distributed with mean 10.6 
seconds. She trains intensively for a week and 
then records her time in the next 6 races. Her 
times, in seconds, are 


10.70, 10.65, 10.75, 10.80, 10.60 


Is there evidence, at the 5% level, that training 
intensively has improved her times? 


3. Family packs of bacon slices are sold in 1.5 kg 
packs, A sample of 12 packs was selected at 
random and their masses, measured in 
kilograms, noted, The following results were 
obtained: 


Ix=17.81, Lx*= 26.4357 


Assuming that the masses of packs follow a 
normal distribution with variance o?, test at the 
1% level whether the packs are underweight, 


{a) if? is unknown, (b) if o* = 0.0003. 


4. It is thought that a normal population has mean 
1.6. A random sample of 10 observations gives a 
mean of 1.49 and standard deviation of 0.3, 
Does this provide evidence, at the 5% level, that 
the population mean is less than 1.6? 


5S. A random sample of 8 observations of a normal 
variable gave 


ix = 36.5, L(x — x)? = 0.74 


Test, at the 5% level, the hypothesis that the 
mean of the distribution is 4.3 against the 


alternative hypothesis that the mean is greater 
than 4.3. 


1, small sample size 


6. The cholesterol levels of 8 women were 


measured, with the following results, 
3.1, 2.8, 1.5, 1.7, 2.4, 1.9, 3.3, 1.6 


Making any necessary assumptions, 


(a) test, at the 5% level, whether the sample has 
been drawn from a distribution with mean 
cholesterol level 3.1, 

(b) calculate a symmetric 95% confidence 
interval for the mean cholesterol level. 


. A marmalade manufacturer produces thousands 


of jars of marmalade each week. The mass of 
marmalade in a jar is an observation from a 
normal distribution having mean of 455 gand 
standard deviation 0.8 g, 


Following a slight adjustment to the filling 
machine, a random sample of 10 jars is found to 
contain the following masses, in grams, of 
marmalade: 


454.8, 453.8, 455.0, 454.4, 455.4, 
454.4, 454.4, 455.0, 455.0, 453.6 


(a) Assuming that the variance of the 
distribution is unaltered by the adjustment, 
test at the 5% significance level the 
hypothesis that there has been no change in 
the mean of the distribution, 

(b) Assuming that the variance of the 
distribution may have been altered, obtain 
an unbiased estimate of the new variance 
and, using this estimate, test at the §% level 
of significance the hypothesis that there has 
been no change in the mean of this 
distribution. (C} 


. Six observations of a continuous random 


variable X gave the following values: 
120.3, 122.4, 119.8, 121.0, 122.5, 119.6 


State any conditions that are necessary for the 
valid use of a t-test to test a hypothesis about the 
mean of X, 

Assuming that the use of the test is valid, test 
the null hypothesis that the mean of X is 120 
against the alternative hypothesis that the mean 
is not 120, using a 5% significance level, 


- Arandom sample of 12 independent 


observations of a normally distributed random 
variable X is taken from a population and a test 
statistic, t= 2,9, calculated. It is thought that the 
population mean pu is 27, Write down suitable 
null and alternative hypotheses to carry out a 
two-tailed significance test for sz and use a test 
to test your hypotheses at the 1% level. 


527 


528 A 


interview has a normal distribution, use a étest 


10. A firm of solicitors claims that, on average, 


to determine, at the 5% significance level, 


interviews with clients last 50 minutes. A 

random sample of 15 interviews is chosen, and 
the time taken for each interview, x minutes, is 
noted. The results are summarised by Ex = 746 
and Ex? = 37 180, Assuming that the time for an conclusion. 


whether the firm is overstating the average 
interview time. Give null and alternative 


HYPOTHESIS TEST 2: TESTING A BINOMIAL PROPORTION p WHEN n 
IS LARGE 


Consider the situation when independent trials are carried out, each with a probability p of 
success, where p is constant. If X is the number of successes in 7 trials, then X follows a 


binomial distribution i.e. X ~ B(x, p) (page 279). 


In Chapter 10 (hypothesis tests for discrete variables, page 483) you learnt how to carry out a 


hypothesis test for an unkriown binomial proportion p. This involved calculating binomial 
probabilities which are relatively easy to find when 7 is small. 


When x is large, however, the calculations can become very cumbersome and in such cases it 


is useful to use the normal approximation to the binomial distribution: 
If 2 is sufficiently large such that mp > 5 and nq > 5, then the binomial distribution 
X ~ Bin, p) 
can be approximated by a normal distribution 
X ~ N(p, npq), where q = 1-p. 


When performing the hypothesis test, you are able to work in standardised z-values. Since the 


normal distribution is continuous and the binomial is discrete, you will need to use a 


continuity correction (see page 383) and this involves amending your test value by adding or 
subtracting 0.5. Further details are given in the following examples. The stages of the test are 


the same as in the general procedure outlined on pages 513. 
When testing the proportion p, of a binomial population, 
the test statistic is X, the number of successes in # trials, where X ~ Bln, p). 
When is large such that mp > 5, and aq > 5, X is approximately normal and 
AX ~ N(sp, npq). 

in standardised form, 

dcpatets pate — MD. Peer ar 
the test statistic is Z = ~-—- where Z ~ N(O, 1), 
npg 


V 


hypotheses, full details of your procedure and a 
(C) 


Example 11.8 Xs: np 


Caroline was asked to test whether a coin is biased in favour of heads, using a 5% level of 
significance. She tossed the coin 100 times and obtained 57 heads. What should she have 


concluded? 


Solution 11.8 


1. Define the 


a head be p. Then X ~ B(100, p). 


variable. 


2. State Hy 


H Hi: p > 0.5 (the coin is biased in favour of heads) 


and Ay, 


Let X be the number of heads in 100 tosses and let the probability of obtaining 


Hy: p = 0.5 (the coin is not biased and heads or tails are equally likely to occur) 


If Hy is true, then p=0.5, so X ~ B(100, 0.5). 

Now x (the number of tosses) is large and np = 100 x 0.5 = 50> S,ng=S5S0>5, 
Since np > S and nq > 5, use the normal approximation, 

X ~ N(np, npq) with np = 50 and npg = 100 x 0.5 x 0.5 = 25, 

ie, X ~ N(50, 25) 

s.d. = Vapq = V25 = 5 


Use a one-tailed test at the 5% level. 


The critical z-value for the 5% critical region (upper tail) is 1.645. 
Reject Hy if z> 1.645, where z is the sample value when standardised. 


When standardising the sample value of 
57 beads you have to use a continuity 
correction. Think of the discrete value of 
57 being represented by a rectangle over 
the continuous values from $6.5 to 57.5. 


In order to reject Ho, the complete x: 50 ag 
rectangle must lie in the critical region. i 
Therefore take as the test value the lower test value 
boundary, 56.5. 
56.5 — 
Sov geet 
Vupq 
7 56.5 — 50 
5 5% 
=13 
7 2 T 1.645 
56.5 


Since z < 1.645, the sar ple value of 57 heads is not in the critical re ion and 
? 8) Ay 


On the statistical evidence, Caroli 
) » Caroline should have concluded that the coin i 
biased in favour of heads. args 


It is interesting to work out how many heads would need to be obtained to conclude that the 
coin is biased in favour of heads. This can be done as follows: 


The standardised test value lies in the critical region if z> 1.645. 


e number of heads is x y! ity Corr N, you need to consider 

Tf th b head: > then, appl ing: the continu ty correctio: fe} d 
need 

x —0.5 when standar dising the test value. 


ia 


Therefore > 1.645 
Vapg 
¢ x — 0.5) - 50 
Le, é ) > 1.645 
) 
*>504+0.54+1.645 x 5 
x > 58.725 
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Since x is an integer, the least value of x is 59. So if Caroline had obtained 59 or more heads 
when she tossed the coin 100 times, she would have concluded that the coin was biased in 


favour of heads. 
NOTE: This result is perhaps surprising. Would you have thought that more heads would be 
needed? 


Example 11.9 


A manufacturer claims that a particular brand of seeds has a 90% germination rate. To test 
this claim, 150 randomly selected seeds are planted and it is noted that 124 germinate. Does 
this provide evidence, at the 1% significance level, that the manufacturer is overstating the 
germination rate of the seeds? 


Solution 11.9 
Let X be the number of seeds that germinate in 150 and let the probability that 


variable, a seed germinates be p. Then X ~ B(150, p). 
2. State Hy Hg: p = 0.9 (the germination rate is 90%) 
and Hy, Hy: p < 0.9 (the germination rate is less than 90% and the manufacturer is 


overstating the rate) 
If Hy is true, then p = 0.9, so X ~ B(150, 0.9). 
Since n is large, check whether the normal approximation can be used. 
Now np = 150 x 0.9 = 135 > 5 and nq =150x0.1=15>5. 
Since np > 5 and nq > 5, use the normal approximation, 
X ~ N(np, mpq) with npg = 150 x 0.9 x 0.1 = 13.5 
ie. X~ N(135, 13.5) 


Use a one-tailed test at the 1% level. 


The critical z-value for the critical region (lower tail) is -2.326. 


So Hy is rejected if your test value, when standardised, 8 
is less than -2.326. The test value is 124, but when \ 


you consider the continuity correction in this 
lower tail test you want to see whether the 
complete rectangle for 124 lies in the critical 


eztt a 
f 

/ 

I 


region, so you need to standardise 124.5. 


Reject Hy if the standardised value of 124.5 is less 
than ~2.326. 


_ 124.5 ~ mp \ 


8 
no use this as 
\ the test value 


6. Perform the 


_ 1245-135 ix Ke 


V13.5 Zz . 3 


=-2.857 ... x: a 2.826 135 


Since z < ~2.326, the sample value is in the 
critical region and so Hy (the germination rate is 90%) is rejected in favour of 
H, (the germination rate is less than 90%). 


There is evidence that the manufacturer is overstating the germination rate. 


—— 


Example 11.10 


The r. i i 
7 sgatedior aie x sae be modelled by a binomial distribution with parameters 2 = 200 
5 alue 1s unknown. A significance test is performed, based on a sample value x. 
> 


to test the null hypothesis p = 0.4 agai i 
‘ = 0.4 against the alternat i ili 
making a Type I error when performing this test is 0.05, a Pegs 


(a) Find the critical region for x. 


(b) Find the probability of making a Type Il error in the case when p=0.3. 


Solution 11.10 


(a) You are given that X ~ B(200, p). 
The hypotheses are Hy: p = 0.4 
Hy: p <0.4 
If Hp is true, then p = 0.4, so X ~ B(200, 0.4). 
Now np = 200 x 0.4 = 80 and ng = 200 x 0.8 = 160 
Since mp > 5 and ng > 5, use the normal approximation, 
X ~ N(ap, mpq) with np = 80, mpq = 200 x 0.4 x 0.6 = 48, 
so X ~ N(80, 48). 
You are given that P(Type I error) = 0.05, 


Since P(Type I error) = P(Hp is rejected i is i 
ince | when H, 
significance level of the jest poo gene eerie 


So the significance level of the test is 5%. 


eed a cease test, at the 5% level, the critical z-value for the critical region (lower 
ail) is -1.645. So reject Ay if the sample value, when standardised, is less than —1.645, 


To find the critical region for x, remember that a 
continuity correction is needed. Since you are 
considering values in the lower tail and you want 
to include the complete rectangle representing x, 


usex +05, 
x+0.5-80 ae 
aa <-1.645 z: -1.645 * + 05 
x <80~0.5 ~ 1.645 x ¥48 
x < 68.10... 
Since x is an integer, the critical region is x < 68. 
~—~Check: 
‘ 68.5 ~80 
When x 68, z = = 1.659 < -1.645, so 68 is in the critical region. 
69.5 — 80 
When x = 69, g = = i i 
nx =69,z aq 1.515 > -1.645, so 69 is not in the critical region. 
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7. Inan investigation into the ownership of mobile 


In Manuel’s restaurant, of a random sample of 


(b) If p = 0.3, the hypotheses become 


Hy: p = 0.4 

Hy: p =9.3 

From part(a), the critical region is X < 68, 
so Hy is accepted when X > 68. 


P(Type Il error) = P(Hy is accepted when H, is true) 


= P(X > 68 when p = 0.3) 


When p = 0.3, np = 200 x 0.3 = 60 and mpq = 200 x 0.3 x 0.7 = 42 


Therefore X ~ N(60, 42). 


Note that the conditions mp > 5, ng > S are satisfied so the normal approximation can be 


applied. 


Now P(X > 68) > P(X > 68.5) (continuity correction), 


P(iType I error) = P(X > 68.5 when X ~ N(60, 42)) 


Distribution given by Hy; 
X~ N (60, 42) 


phones amongst school children, 200 randomly 
chosen school children were interviewed and 142 
owned a mobile phone. Test, at the 5% level of 
significance, the hypothesis that 65% of school 
children own a mobile phone against the 
alternative hypothesis that more than 65% own 
a mobile phone. 


8. (a) A gardener sows 150 Special cabbage seeds 


and knows that the germination rate is 
75%. By using a suitable approximation 
find the probability that: 

(i) more than 122 seeds germinate 

(ii) fewer than 106 seeds germinate 


(b) The gardener also sows 120 Everyday 
cabbage seeds and finds that 81 germinate. 
Test whether the Everyday seeds have a 
germination rate less than 75%. Perform a 
significance test at the 4% level. 


9. A government report states that a third of 


12, 


100 people ordering meals, 31 ordered 
vegetarian meals. 


(b) Set up null and alternative hypotheses and, 
using a suitable approximation, test whether 
or not the proportion of people eating 
vegetarian meals at Manuel’s restaurant is 
different from that at Enrico’s restaurant. 
Use a 5% level of significance. (L) 


When a drawing pin is dropped on to the floor, 
the probability that it lands point up is Dp. 


(a) A teacher drops a drawing pin 900 times 
and observes that it lands point up 315 
times. Test, at the 1% level, the hypothesis 
that p = 0.4 against the alternative 
hypothesis p < 0.4. 

(b) A student drops a drawing pin 600 times 
and observes that it lands point up 251 
times. Using the student’s results, find a 


a symmetric 95% confidence interval for p. 
68.5 — teenagers in Great Britain belong to a youth OSU ass acai 
=P|Z > —==— organisation. A survey, conducted among a As part of a Statistics investigation, 1500 uv 
42 2 . students carry out similar experiments and they 
random sample of 1000 teenagers from a certain ; < | 
2 each calculate (correctly) their own symmetric 
=P(Z> 1.312) city revealed vee a pees pacridane h 95% confidence interval for p. Find the expected 
=1-0.6224 Srganisation. Does this provide evidence, at t ce number of these intervals that do not contain the 
38% 2% level, that the proportion of teenagers in this 
7 ‘oO 


Exercise Llc 


1. In the following, X ~ B(n, p) with 1 as shown. ; 
p is unknown and ~ is the number of successes in 
the sample. 

Test the hypotheses stated at the level of 
significance indicated. 


Testing a binomial proportion, large n 


3. Ina survey it was found that 3 out of every 10 
people supported a particular political party. A 
month later a party representative claimed that 
the popularity of the party had increased. Would 
you accept that the oumber who supported the 
party was still 3 out of 10 if a further survey 


city who belong to a youth organisation is 
greater than the national average? 


10. A questionnaire was sent to a large number of 


people, asking for their opinions about a 
proposal to alter an examination syllabus, Of the 
180 replies received, 134 were in favour of the 
proposal. Stating-a necessary assumption, 


(a) test, at the 5% level} hypothesis that the 
population proportion in 
proposal is 0.7 against the alté 


13. 


14, 


true value of p. (C) 


After carrying out a survey, a market research 
company asserted that 75% of TV viewers 
watched a certain programme. Another company 
interviewed 75 viewers and found that $1 had 
watched the programme and 24 had not. Does 
this provide evidence, at the 5% significance 
level, that the first company’s figure of 75% was 
incorrect? 


‘The Paper Engineering Company has 


fe 
i ie of 100 people. is more than 0.7, traditionally supplied 85% of the retail outlets I. 
n x Hypotheses Level nie er cee the 3% level, : (b} find a symmetric 95% confidence interval for origami products, With the onset of increased | 
8 S% ee eis for the population proportion in favour of competition they feared that this proportion i 
(a) 50 45 Fy p= 0.8; . 4. A large college claims that it admits equal | the proposal. (C) might have fallen. They examined a random 
Hy: 'p > 0.8 ‘ hanes of atts and women. In a random | . Ra sample of 500 retail outlets and found that 405 
3 9 were : a 
60 42 Hy p= 0.55. 2% Je of 500 students at the college there 11. Over a long period of time it has been found that of them sold Paper Engineering Company 
(b) We reo ee 367 ma es. Is this evidence, at the 5% level, that in Enrico’s restaurant the ratio of non-vegetarian products, Use a normal approximation to the 
1 : he college population is not evenly divided to vegetarian meals ordered is 3 to 1. binomial distribution to carry out a hypothesis 
21 Ho? p.= 0.25, 53% the’ con eee POP During one particular day at Enrico’s restaurant, test at the 1% significance level to test whether 
(c) 120 oP between males and females? OnE Bs y : : : 
Hyp # 0.25 i a random sample of 20 people contained two or not their proportion of the retail outlets has 
i jlity of an event who ordered a vegetarian meal, fallen, Give suitable null and alternative i 
300 213 Ay: p= 0.63 1% 5. A theory predicts that the probabi y 1 
(¢) ah p#0.65 is 0.4. The theory is tested experimentally and in {a} Carry out a significance test to determine hypotheses and state your conclusion clearly. (C) B 
2 9 400 independent trials, the event occurred 140 whether or not the proportion of vegetarian 
Hyp 0.76, - 1% P ignificantl y 
(ec) 90. 56 oF B= 9676, times. Is the number of occurrences significantly meals ordered that day is lower than usual. 
Hy: p < 0.76 less than that predicted by the theory? Test at the State clearly your hypotheses and use a 10% 


2. A manufacturer claims that 8 out 10 dogs prefer 
its brand of dog food to any other. In a random 
sample of 120 dogs, it was found that 85 7 
appeared to prefer that brand. Test, at the 5% 
level, whether you would accept the 
manufacturer’s claim. 


1% level. 


6. It is thought that the proportion of defective 
items produced by a particular machine is 0.1. 
A random sample of 100 items is inspected and 
found to contain 15 defective items. Does this 
provide evidence, at the 5% level, that the 
machine is producing more defective items than 
expected? 


significance level. Use an exact binomial 
test. 


HYPOTHESIS TEST 3: TESTING #, — #2, THE DIFFERENCE BETWEEN 
MEANS OF TWO NORMAL POPULATIONS 


This test is used when you have two normal populations X, and X, with unknown means, 11, 
and jt, and you want to test the difference between the means of these populations. Consider 


X, ~ NG4t,,0;2) and X, ~ N(to,0”). 

The hypotheses might be: 

Agi ly — M= 

ys fey — bly > ve (Of fy ~My <0 OF fy Hy F) 

Often the test involves the null hypothesis that the means are the same, ie. 0, = {2 OF 

My — Ma = 0, so the null hypothesis would be Ho: 4, - #2 = 0. 

To test the difference between the means, take a random sample of size 7, from X, and work 
out its sample mean, %,. Also take a random sample of size n, from X, and work out its 
sample mean X4. 

The test statistic is X, - X,, and you need to consider the sampling distribution of the 


difference between means. The mean and variance of this distribution can be found as follows: 


03 and 437) 


E(X, oe X,) = E(X,) = E(X,) = by He 
fe 7 ee ee 
Var(X, ~ X,) = Var(X,) + Var(X,) = ++ 
Ny Ny 
7 
Remember the + sign bere 
The distribution of X,—X, depends on various factors and careful analysis of a given 
situation is required in order to decide which test to use. In each situation described below, 


the underlying distributions are normal. 
Note that, for reference, the 95% confidence intervals for 1, — 4, are also given. 


Test 3a: The population variances o 7 and o,7 are known 


If the variances rei and o? are known, 
2 2 
eae a - f Of a7 \ 
the test statistic is X,-X, where A, ~ X,~ Nia — fy, 
\ n 
\ t 


In standardised form, 
cy Ky Ky > (ty — A) ; ee 
the test statistic is Z = — d — where Z ~ N(O, 1). 


22 
Note that the 95% confidence limits for w, — pt, are (%,—X,) + 1.96 pid era 
Nn, n 


t 2 


ee 


| 
| 
| 


(Z-TESTS A 


Test 3b: The populations have a common variance, o7, which is known 


If there is a cor ; ion i 2 
$a common population variance o2( = o; 


the test statistic is X,~X, where X,-X,- NI 


In standardised form, 


the test statistic is Z =<! 


Note that the 95% confidence limits for My — Hy are (%1-%,) + 1.960 aan a 
My My! 


If th lati 2 | | a2 1 


instead. This is sometimes known as a pooled two-sample estimate, where 
2 2 
NyS/ +nys 
a2 _ My Si +128) 
Ga 2 2 : 
eo (sy and s,° are the sample variances) 


An alternative format for 6? is 


The distribution of X, ~X 2 depends on the whether the samples taken are large or small 


Large sdmples 


For large samples th 18 of X 238 approxiniate LO: 
4 a s the d £ Xx ‘ t 
f iwution of J 4 8 approximately normal 
} fq \\ 


The test statistic is X, X, where X, 


Tn standardised form, 


X\-X. 


the test statistic Z = —1L 


Note that the 95% confidence limits for Hy ~My are X,—X, + 1.966 es + ES 
My Ny 


Small samples 
For s ¢ Sait se the standard 7 i ; 
small samples the standardised form of the distribution of X, ~ X, follows a t-distributi 
4 2 S$ a i-ais On, 


The test statistic is P= 8M) 1 
e tatistic is J = —*— ss =~, where T ~ ti, + 2, — 2) 


ote that the 95% confi lence ilmits for ff, — fly are , —Ry) +46 | —+— where 1S Sul 
Note that ¢ 98% d ts +f 5 h t ck 
that P(T <2) =0.975 for tn My 2). 


E le 11.11 | Example 11.12 | 
xample 11. 


F : : The same physical fitness test was given to a grou of 100 scouts and to a group of 144 

. : : f a certain species of small animal are ; i g! | group group | 
Due to differences in the faba a fe aon a na Se that the masses in both guides. The maximum score was 30. The guides obtained a mean score of 26.81 and the 
believed to be greater = cna ith masses in Region A having a standard deviation of scouts obtained a mean score of 27,53. Assuming that the fitness scores are normally 
eae sani sks ae hai a standard deviation of 0.09 kg distributed with a common population standard deviation of 3.48, test at the 5% level of 
0.04 kg and masses in region av’ : 


f significance whether the guides did not do as well as the scouts in the fitness test. 
To test the theory, random samples are taken: 60 animals from Region A had a mean mass 0: 


3.03 kg and 50 animals from Region B had a mean mass of 3.00 kg. 


Does this provide evidence, at the 1% level that the animals of this species in Region A have a 
greater mass than those in Region B? 


Solution 11.12 


Let X, be a guide’s score and let the population mean be My. 
Then X, ~ N(t,, 0?) with o = 3.48. 


Let X, be a scout’s score and let the population mean be 2). 
Solution 11.11 Then X, ~ N(u with o = 3.48, 


1, Define the Let X, be the mass, in kilograms, of an animal in Region A and let the 
population mean be #,. Then X, ~ N(jey, 0.04 ). ; 

Let X, be the mass, in kilograms, of an animal in Region B and let the 
population mean be . Then X, ~ N(u>, 0.09°). 


0 (there is no difference in the performance) 
Fy: My 7ftz < 0 (the guides did not perform as well as the scouts) 


i i i between the regions) <4 
2. State EH, Ho: #1 — Ha = 0 (there is no difference in the masses iat. : (ty). i ; 
and ere : Ha i ~My > 0 (the animals in Region A have greater mass) as X,-X, ~ N(uy-s,0 7 " _ with n,= 144, n,= 100 
3, State the Consider the distribution of the difference between the means, X,— X). 


If Hy is true then pr, — 2 = 0, 
bution of 2 2 
io? 4s O71 9% : = 
statist X,-X,~ ue -tye with 2, = 60, 2, =50 
rT 2 


a | 
X,~X, ~ N(0,3.487/—— + 
BP ATS eho ( # (43 * 00] 


If Hy is true then pr, — pt = 0, Use a one-tailed test (lower tail) at the 5% level. 


= | 0.04? | 


Be ee eo 50 


Use a one-tailed test (upper tail) at the 1% level. 


The critical z-value is -1.645, so reject Hy if z<-1.645, where g= 2 ul 


> af 
6 ft — 
My My 


The sample values are ¥, = 26.81, ®, = 27.53. 


The critical z-value is 2.326, so reject Hy if z > 2.326 


Li _ Ry, -0 a 
h ciated : as ih 4 VA \ 
where 25 48 {——4—_ / \, 
0.042 0.09 348 4a * T00 / x 
: 60 * 50 26.81 — 27.53 \ 
7 i SS. 5% 
eu eelat 3.03 - 3.00 0.452... 4, NN 
“0.0137... 1% ; =-1,589 Zz -1.645 0 
= 2.184 ... a” <9 — Since z > -1.645, do not reject Hg. 
Since z < 2.326, do not reject Hp. z: 0 2.326 


There is no evidence, at the 5% level, that the guides did not perform as well as 
the scouts in the fitness tests, 


There is no evidence, at the 1% level, that the animals in region A have a greater 
mass than those in region B. 


Example 11.13 


An investigation was carried out to assess the effects of adding certain vitamins to the diet. A 
group of two-week old rats was given a vitamin supplement in their diet for a period of one 
month, after which time their masses were noted. A control group of rats of the same age was 
fed on an ordinary diet and their masses were also noted after one month. 


The results are summarised in the table: 


Number in Standard 

sample Mean deviation 

With vitamin supplement 64 89.62 i g 
Without vitamin supplement 36 83:5 & Alg 


F . 2 
Treating the samples as large samples from normal distributions with a same variance, 0°, 
test at the 5% level whether the results provide evidence that rats given e e vitamin 
supplement have a greater mass, at age six weeks, than those not given the vitamin 


supplement. 


Solution 11.13 


Let X, be the mass of a rat given a vitamin supplement and let the population 
mean be yt, Then X, ~ N(u;, 0”) with o unknown. 


Let X, be the mass of a rat in the control group and let the population mean be 11. 
Then X, ~ N(u, 0?) with o unknown 


42 
i i oe ere 
Since the common population variance o” is unknown, use 67 wh 


2 
nysp +s 


a2 
oe nytn,-2 
64 x 12.967 +36 x 11.41? 
7 64+36-2 
= 157.5... 
6=12.55 ... ne 
2.5 ¢ fb fl = is no difference in the masses of the two groups 
a ii a i Se > ; heney given vitamin supplements are heavier) 


Consider the distribution of the difference between the means, X,- X). 


Ree o{* + *)\ with ¢ = = 64, 2, = 36 
the test statistic X,-X, ~ Nf ~mr(r+s) with 6 = 12.55, n, = 64, my 
according to Hy. 

If Ho is true then wy ~ #, = 0, 
tae i 
so X,-X, ~ N{0,12.55%{ 7 +2] 


Use a one-tailed test (upper tail) at the 5% level. 


s the level 


i The critical z-value is 1.645, so reject Hy if z> 1.645 


Since z> 1.645, reject Ay 


There is evidence, at the 5% level, that the rats given the vitamin supplement 
have a greater mass than the rats not given the supplement. 


= tennant 


Example 11.14 


Two statistics teachers, Mr Chalk and Mr Talk, argue about their abilities at golf. Mr Chalk 
claims that with a number 7 iron he can hit the ball, on average, at least 10 m further than Mr 
Talk. They cénducted an experiment, measuring the distances for several shots. 

Denoting th¢ distance Mr Chalk hits the ball by x metres, the following results were obtained: 
1, =40, Ex © 4080, E(x 3)? = 1132, 
Denoting the distance Mr Talk hits the ball b 
ny = 35, Ly = 3325, Ly —F)? = 1197, 


Assuming that the populations have a comm, 
1% level, to support Mr Chalk’s claim. 


Y y metres, the following results were obtained: 


on variance, test whether there is evidence, at the 


Solution 11.14 


Let X be the distance, in metres, for Mr Chalk and let the population mean be j,. 
Then X ~ N(w,, 02) with o unknown. 


Let Y be the distance, in metres, for Mr Talk and let the population mean be Mo. 
Then Y ~ N(#2, 0”) with o unknown. 


An unbiased estimate for o? is 6? where 
gt U(x -#)? + Ey — 7)? 
ny +n,-2 
_ 1132-41197 


~ 40+435-2 
= 31.904... 
6=5.6483 ... 


The unbiased estimate of the common population standard deviation is 5.648 
(3 d.p.). ; 

Ag: 4, ~ #2 = 10 (Mr Chalk hits the ball 10 m further than Mr Talk) 

Mr Chalk claims that he can hit the ball at least 10 m further than Mr Talk. Mr 


Talk wants to refute this, so take as alternative hypothesis that Mr Chalk hits 
the ball less than 10 m further than Mr Talk. 


2. State Hy 


AY: fy 4, < 10 (Mr Chalk hits the ball less than 10 m further than Mr Talk) 
Consider the distribution of the difference between the means X — Y, 
Pe softs, TW nce 5, 
X-Y~ Nu 10's with 6 = 5,648, n,=40, 2, = 35 
Ny, My 
If Hp is true then 4, ~ 25 = 10, 


ot <a i arene | 
-Y~ 0, 5.6487{— +], 
so X-Y fi 5) (z0*3s}] 


540 At 
AND TESTS) 541 


Use a one-tailed test (lower tail) at the 1% level. Solution 11.15 


(a) Using a calculator in standard deviation mode, the following values were obtained. Check 
os eck 


The critical z-value is -2.326, so reject Hy if z < —2.326. 
them using your calculator. 


i R,-%,-10 8, - 8, -10 
where z= 7 
é i, i 5.648 1, i Mean Variance 
m My 40 35 Type A boiler. 63,83 104.32 
2 4080 i Type B boil 
From the samples, x = pain aR 102 a i TPE © Power. $2.89 72.07 
Ey 3325 7: (b) (i) Let X, be the mass of dust deposited in a type A boiler and let the 
255 35" As / | population mean be 4, and population variance be o2 
n z 
102-95 sae A Then X, ~ N(,, 02) with o unknown. 
- 95 - dy : 
So z= aa ‘i Z: ~2.326 0 Let a be the mass of dust deposited in a type B boiler and let the 
5.648 6 + ra i population mean be w, and population variance be 02, 
=-2,29.., | Then X, ~ N(u, 07) with o unknown. 


: 7 i 24 a2 
your Since z > —2.326, do not reject Hp. Since o” is unknown, use 6? where 


There is not sufficient evidence, at the 1% level, to reject Mr Chalk’s claim that 


2 2 
62 = 151+ M287 
Ny +nN,~2 


he hits the ball, on average, at least 10 m further than Mr Talk. 
peasads ieee RELL in he _ 13 x 104.32 +9 x 72.07 
13+9-2 
= 100.23 .. 
Example 11.15 = : 
p 6 = 10.01 (2 d.p.) 


An investigation was conducted into the dust content in the flue gases of two types of solid 2, 


fuel boilers. ie Aly: # — Uy = 0 (there is no difference in the masses deposited) 


Hy: Uy — “#0 (there is a difference in i 

: ' : th 

Thirteen boilers of type A and 9 boilers of type B were used under identical fuelling and 3, State the Citcias dieiea tte ¢ masses deposited) 
extraction conditions. Over a similar period, the following quantities, in grams, of dust were strils + onsider the distribution of the difference between the means, X, — X,, 
deposited in similar traps inserted in each of the 22 flues. 


Since the samples are small and the common population variance is 
unknown, the test statistic is T where T ~ (n, +m, -2) and 


X, = X, = (4; ~ My) 


Dust deposit (g) in Type A boilers: 


73.1, 56.4, 82.1, 67.2, 78.7, 75.1, 48.0, 53.3, 55.5, 61.5, 60.6, 55.2, 63.1 cD - ii a eae 
2 EMM Rg hag 
Dust deposit (g) in Type B boilers: é eA eo 
a 
53.0, 39.3, 55.8, 58.8, 41.2, 66.6, 46.0, 56.4, 58.9 1 7 
If Hy true then w, — 1, =0, 
(a) Find the mean and variance of each of the samples. X,-,-0 
(b) Assuming that these independent samples came from normal populations with the same so Ts Rien car and T ~ ¢(20) 
variances: 10.01 jee 


(i) use a two-sample t-test at the 5% level of significance to determine whether there is 


any difference between the two samples as regards the mean dust deposit, Use a two-tailed test at the 5% level, 


(ii) test at the 5% level of significance whether there is any difference between the two ; 
samples as regards the mean dust deposit where this time you should also assume that Because YOU Want, 2.5 % in Critical values for t (see page 650) 
the population variances are both known to be 196.0. j we as tail, the critical value Pp | 9.75 9.90 9.95 { 0.975 
; : for t is found from row 
(c) Explain the apparent contradiction in your results. (AEB) : v= 20, p= 0.975 giving v=1] 1.000 3.078 6.314 | 12.71 
: +2.086 2 | 0.816 1.886 2.920 | 4.303 
Reject Hy if t< —2.086 or 19} 0.688 1.328 1.729 | 2.093 


t> 2.086, ie. if | t|> 2.086. 20 | 0.687 1.325 1.725 | 2086 


From the samples, X, = 63.83, ¥, = 52.89 
63.83 - 52.89 
{= ——\—_———_ 


2.5% 2.5% 
T 4 
since t > 2.086, reject Hp. t: 2.086 0 2.086 
There is a difference between the samples with regard to the mean dust 
deposit. 


(b) (ii) This time the population variances are known to be 196 and so a z-test is performed, 
rather than a t-test. 
X,~ N(uty, 0”) with o = V196 = 14 
X, ~ N(uy, 0”) with o = 14 
The hypotheses are as before, but the test statistic, the difference between the means, 
is distributed 


: ae ; ~ 
X,-X, ~ ufe mores) with o = 14, 2, =13, 2, =9. 


A two-tailed test at the 5% level gives critical z-values of £1.96 (see page 649). 
So reject Hy if z<-1.96 or z> 1.96, ie. if] z]< 1.96 
R,-X,-0 

| 
o J—+— 

my Mm 

63.83 - 52.89 

22 


4 2.5% 2.5% 
14 J—+— 


where z= 


Exercise lid 
populations 
Section A: #-tests 


= 1.802 bad ‘ z: 196 1.96 


since | z|< 1.96, Hg is not rejected. 
There is no difference between the samples with regard to the mean dust deposit. 


(c) Considering the variances of the samples, it would seem that the ee 
variance of 196.0 given in part (b) is suspect. The value of just over 100 given by the 


is li ate. 
unbiased estimate appears more reasonable and so result (a) is likely to be more accur: 


the difference between means of two normal 


1. In each of the following, a random sample of size #, is taken from population X and a random sample of 


size #2 is taken from population Y. 


Use the information given to test the hypotheses stated at the level of significance indicated. 


(a) X ~ N(u,03), X ~ N&t;,02) 


ny ox a, ny xy oe Hypotheses Level 
@ 100.4250 30. 80. 3544 35 Hy —y=0 5% 
Fires #0 
(i) 20 95 23 05 138 28 Hewieys 2% 
Ay <p, 
(ii) 50. sas 5 50 1480.74 Hy: ly =a 1% 
Ay i> ws 


{b) X and Y have a common variance a”, so X ~ N(u,, 07), Y~ N(u2, 0”). 


Common population 


ny =x Ny xy. standard deviation (o) Hypotheses Level 
50 2480 40 1908 45 Hi i=u, 2% 
Fs py tu, : 
100 12730100 12410 10.9 Hy yi 5% 
Tih >H, 
30 192 45 315 1.25 Hetty =i 1% 
: Fey <M 
200° 18470... 300 27 663 0.86 Hi li=ia 10% 
Ay ti 
ny Ex DKS x)? ny zy Ly = 9) Hypotheses Level 
(v) 40 2128 810 50 2580 "772 Ag: 4, =H, 5% 
Ay iy >, 
(vi) 80 6824 2508 100 8740 3969. Hos ey Sly 2% 
Ayr ay # ay 
4672 9026..°: Hy: uw; ~4,=20 1% 


(vii) 6S. 5369 8886 80 


Ayre = > 20 


2. A large group of sunflowers is growing in the 


shady side of a garden. A random sample of 36 
of these sunflowers is measured. ‘The sample 
mean height is found to be 2.86 m, and the 
sample standard deviation is found to be 0.60 m. 
A second group of sunflowers is growing in the 
sunny side of the garden. A random sample of 26 
of these sunflowers is measured. The sample 
mean height is found to be 3.29 m and the 
sample standard deviation is found to be 0.9 m. 
Treating the samples as large samples from 
normal distributions having the same variance 
but possibly different means, obtain a pooled 
estimate of the variance and test whether the 
results provide significant evidence {at the 5% 
level) that the sunny-side sunflowers grow taller, 
on average, than the shady-side sunflowers. (C) 


3. The lengths, in millimetres, of 9 screws selected 
at random from a large consignment are found 
to be: 


8.00, 8.02, 8.03, 7.99, 8.00, 
8.01, 8.01, 7.99, 8.01. 


From a second large consignment, 16 screws are 
selected at random and their mean length is 
found to be 7.992 mm. 

Assuming that both samples are from normal 
populations with variance 0.0001, test, at the 
5% significance level, the hypothesis that the 
second population has the same mean as the first 
population, against the alternative hypothesis 
that the second population has a smaller mean 
that the first population. (C) 


4. Hischi and Taschi are two makes of video tapes. 
They are both advertised as having a recording 
time of 3 hours. A sample of 49 Hischi tapes 
was tested and denoting the actual recording time 
by 4 minutes, the following results were obtained: 


Lh =8673, XWh—h)?=12 720 
A sample of 81 Taschi tapes was also tested. 


Denoting the actual recording time by ¢ minutes, 
the results obtained were: 


Et=14904, X(t—#)?=33 488 


If the recording times for the two makes are 
normally distributed and have a common 
variance, show that the unbiased estimate of this 
common variance is 361. Test whether there is 
significant evidence, at the 5% level, of a 
difference in the mean recording times. Is the 
difference significant at the 4% level? 


5. A large number of tomato plants are grown 
under controlled conditions. Half of the plants, 
chosen at random, are treated with a new 
fertiliser, and the other half of the plants are 
treated with a standard fertiliser, Random 
samples of 100 plants are selected from each 
half, and records are kept of the total crop mass 
of each plant. For those treated with the new 
fertiliser, the crop masses (in suitable units) are 
summarized by the figures 


Ix = 1030.0, Ex? =11 045.59. 


The corresponding figures for those plants 
treated with the standard fertiliser are 


Ly= 990.0, Ly? = 10 079.19. 


Treating the sample as a large sample from a 
normal distribution, and assuming that the 
population variances of both distributions are 
equal, obtain a two-sample pooled estimate of 
the common population variance. 

Assuming that it is impossible for the new 
fertiliser to be less efficacious than the old 


fertiliser and assuming that both distributions are 


normal, test whether the results provide 
significant evidence (at the 3% level) that the 
new fertiliser is associated with a greater mean 
crop mass, stating clearly your null and 


alternative hypotheses. (C) 


6. Mr Brown and Mr Green work at the same 
office and live next door to each other. 
Each day they leave for work together but travel 


by different routes. Mr Brown maintains that his 


route is quicker, on average, by at least four 


minutes. Both men time their journeys in minutes 


over a period of ten weeks. The results obtained 


were: 
Mr Brown: ny=50, %,=21, s?=10,24 
Mr Green: ,=50, %,=24, s=7.84 


Assuming that the times are normally distributed 


and that they have a common population 
variance, test at the 5% level whether Mr 
Brown’s claim can be accepted. 


7. A random sample of size 100 is taken from a 


normal population with variance oj = 40. The 
sample mean *, is 38.3. Another random sample, 
of size 80, is taken from a normal population 
with variance «3 = 30. The sample mean %, is 
40.1. Test, at the 5% level, whether there is a 
significant difference in the population means 4, 
and pt. 


8. Acertain political group maintains that girls 


reach a higher standard in single-sex classes than 
in mixed classes. To test this hypothesis 140 girls 
of similar ability are split into two groups, with 
68 attending classes containing only girls and 72 
attending classes with boys. All the classes follow 
the same syllabus and after a specified time the 
girls are given a test. The test results are 
summarised thus: 


Girls in the mixed classes: 
Ix=7920, Lx?=879 912 
Girls in single-sex classes: 
Zy=7820, Ly?= 904 808 


Treating both samples as large samples from 
normal distributions having the same variance, 
obtain a two-sample pooled estimate of the 
common population variance. Test whether the 
results provide significant evidence, at the 1% 
level, that girls reach a higher standard in single- 
sex classes. 


9, The mean height of 50 male students of a college 


who took an active part in athletic activities was 
178 cm with a standard deviation of 5 cm while 
50 male students who showed no interest in such 
activities had a mean height of 176 cm with a 
standard deviation of 7 cm. Test the hypothesis 
that male students who take an active part in 
athletic activities have the same mean height as 
the other male students. 

If both samples had been of size n, instead of 50, 
find the least value of 2 which would ensure that 
the observed difference of 2 cm in the mean 
height would be significant at the 1% level. 
(Assume that the samples continue to have the 
same means and standard deviations.) {C) 


10. A random sample of 27 individuals from the 


population of young men aged 18 and of high 
intelligence have foot lengths (in centimetres, to 
the nearest centimetre) as summarised below. 


Foot: length. 
(in'cm) 242252926 27°28 29-30 


Number with this 
foot length PED ge ee See 


Obtain the sample mean and show that the 
unbiased estimate of the population variance, 
based on this sample, is 2.00. Obtain a 96% 
confidence interval for the mean foot length of 
this type of person. 


A random sample of 48 individuals from the 
population of young men aged 18 and of 
moderate intelligence have foot lengths 
summarised by ¥ = 26.6, D(x —%)*=123.20. A 
complex genetic theory suggests that persons of 
high intelligence have a greater foot length than 
do those of moderate intelligence. The two 


samples described above may be assumed to have 


rad . 


section B: t-tests 


1. A random sample of size n, is taken from population X ~ Ni 
2 


taken from population Y ~ N(tg, 07). 


545 


been drawn at random from independent normal 
distributions having a common variance. Obtain 
an unbiased two-sample estimate of this common 
variance. Treating the samples as large samples, 
test this genetic theory, using a significance test 
at the 1% significance level and stating clearly 
the hypotheses under comparison, (C) 


2 ; 
(#4, 0*) and a random sample of size 7, is 


(a) Obtain an unbiased estimate of o? b i 
y pooling the results from the tw 
{b) Test the hypotheses stated at the level of significance indicated. ee 


m2. 
ny =x X(x=%) 23 Ly Ey — 9)? Hypotheses Level | 

(i) 6 171 83 7 164.5 112 Ho ty = ey 5% 
: Ayuy> ty 

ii} 5 678.5 562.3 7 971.6 308.6 Ao: ity = hy 5% 

- Ap ny ti 

(ii) 8 238.4 296 10-206 145 Hy lty-io=4 1% 

: Aya, “iy>4 

(iv) 12 116.16 45.1 18 156.96 72. Ayn, Sy 10% 

Ae ay # My 


2. The heights (measured to the nearest centimetre) 
of a random sample of six policemen from a 
certain force in Wales were found to be: 


176, 180, 179, 181, 183, 179, 


The heights (measured to the nearest centimetre) 
ofa random sample of 11 policemen from a 
certain force in Scotland gave the following data: 


Zy=1991, Xy-9H)?=54, 


Test at the 5% level, the hypothesis that Welsh 
policemen are shorter than Scottish policemen. 
Assume that the heights of policemen in both 
forces are normally distributed and have a 
common population variance, 


3. An expert golfer wishes to discover whether the 


average distances travelled by two different 
brands of golf ball differ signi icantly. He tests 
each ball by hitting it with his driver and 
measuring the distance X (in metres) that it 
travels, The distribution of X may be assumed to 
be normal. 
His results for a random sample of 9 ‘Farfly’ golf 
balls were # = 214 and D(x - x)* = 2048. 

His results for a random sample of 16 ‘Gofar’ 
golf balls were 


% = 224 and U(x —x)* =2460. 


Assuming that the variance of X is the same for 
both types of golf ball, obtain a pooled (two 
sample) estimate of this variance and, test at the 
5% level whether his results for ‘Gofar’ golf balls 
differ significantly from those for ‘Farfly’ golf 
balis. {C) 


Mr Mean notes the time, in minutes, that it takes 
him to drive to work in the mornings. The results 
are: : 


ny=8, Ex,=120, Ex2=1827, 


For his return journey in the rush hour, Mr 
Mean notes that: 


”,=10, Zx,=230, Bx,?=5436. 


He maintains that, on average, it takes him at 
east ten minutes longer to drive home. 


(a) Using the results from the two samples, find 
an unbiased estimate of the common 
population variance. 

(b) Assuming that the times of all journeys are 
normally distributed, use the two-sample t- 
test at the 5% level to test Mr Mean’s claim, 


» Random samples of year 10 pupils at two 


schools are given the same mathematics test, The 
results are summarised thus: 


School A: 2, = 20, %=43, X«-X)?=1296 
School B: 1, =17, y= 36, Ly—F)* = 1388 


Assuming that the distributions of marks are 
normal with a common population variance, test 
at the 2% level whether there is a significant 
difference in the mathematical ability of the Year 
10 pupils at the two schools. 


. Arandom sample of size 7, is taken from a 
population P, whose mean is #4, and variance 0° 
and a random sample of size 7, is taken from 
population P, with mean 2, and variance 0,7. 
Under what circumstances is it valid to test the 
hypothesis 7, — 2, = 0 using a two-sample f-test? 


A machine fills bags of sugar and a random 
sample of 20 bags selected from a week’s 
production yielded a mean weight of 499.8 g 
with standard deviation 0.63 g. A week later a 
sample of 25 bags yielded a mean weight of 
500.2 g with standard deviation 0.48 g. 


Assuming that your stated conditions are 
satisfied, perform a test to determine whether the 
mean has increased significantly during the 

second week. 


Test whether the mean during the second week 
could be 500 g. (Use a 5% significance level for 
both tests.) 


. A liquid product is sold in containers, The 
containers are filled by a machine. The volumes 
of liquid (in millilitres) in a random sample of 6 
containers were found to be: 


497.8, 501.4, $00.2, 500.8, 498.3, 500.0. 


After overhaul of the machine, the volumes (in 
millilitres) in a random sample of 11 containers 
were found to be 


501.1, 499.6, 500.3, 500.9, 498.7, 502.1, 
500.4, 499.7, 501.0, 500.1, 499.3. 


It is desired to examine whether the average 

volume of liquid delivered to a container by the 

machine is the same after overhaul as it was 

before. 

(a) State the assumptions that are necessary for 
the use of the customary #-test. 

(b) State formally the null and alternative 
hypotheses that are to be tested. 

(c) Carry out the t-test, using a 5% level of 
significance. 

({d) Discuss briefly which of the assumptions in 

{a} is least likely to be valid in practice and 

why. (MEI) 


. The performances of trainee actors who have 


passed through a drama school are rated by a 
panel of experienced actors who assign an 
overall mark for each trainee. The drama school 
has recently introduced a new training method 
which, it is claimed, will lead to better 
performances. 
The marks for a random sample of 6 trainees 
using the old training method were 


243, 228, 220, 206, 230, 198. 


and the marks for a random sample of 8 using 
the new method were 


235, 259, 227, 242, 238, 253, 221, 217. 


Use an appropriate t-test to examine, at the 5% 
level of significance, whether there is evidence 
that the new method has led, on average, to 
higher scores. State carefully the assumptions on 
which this procedure is based. 

Provide a two-sided 95% confidence interval for 
the true difference in mean scores between the 
old and new methods. State carefully the 
interpretation of this interval. (MEI) 


HYPOTHESIS TEST! 


Summary 
Hypothesis test 1 (z-tests and t-tests) 


For stages in a hypothesis test, see page 513 


@ For critical values and rejection criteria for a -test see page 513 


@ Standardised test statistics: 
Test 1: Testing an unknown population mean yp, 
When o? is known. 
la X is normally distributed, X ~ N(i, 0) 
For samples of size n (any size), 


Pa 


en 
oe where Z ~ N(0, 1). 


Fy: # = Mo: 


Test statistic Z = 


1b X is not normally distributed 


For large samples of size n, by the central limit theorem, 


= 2 
X~ Mba 7 

n 
Test statistic Z = X= ls where Z ~ N(0, 1). 
olNn ae 


When o? is unknown, 


lc X is preferably normally distributed. For large 1, 


es ee 
ae 


Mee 
Test statistic Z = 0 where Z~ N(O. 
SNE (0, 1). 


1d’ X is normally distributed, X ~ N(@w, 0). For small ny 
X Lg 
on 
Test 2: Testing a binomial proportion p, where X'~ B(n, p). 


X is the number of successes in x trials. 


If mis large such that p> 5 and ng > S, then X ~ N(up, npq). 


Test statistic T= 


where T~ ¢(7= 1). 


Test statistic Z= 


np 
Vnpq where Z ~ N(O, 1). Remiember to use 2 continuity Correction. (8 0:5): 


Test 3: Testing 2, — 3, the difference between means of two normal distributions 
3a of, 0,7 known 

ve LN of oy 

EX Nae ee ee 

Xo Xp by > bas A 


X,-X,- Gym) 


Test statistic: Z = where Z~ N(O, 1). 


Op oO 
Nye Ny 


3b. Common population variance o” known 
Be flood 
XX ~ Nii bes 0 a 
xX, X= (y= My) 
Test statistic Z = anne where Z~N(0, 1). 
Oo fe 
My 
3c Common population variance o? unknown 
nysP + Msy 
ng 22 
BG, 2) +20, — x)? 
: ty +1 —2 
When 7 is large 


(s?, 5:7 sample variances) 


Use 3? 


x -X)- ay b) 


é 
Test statistic Z ea PRES eee where. Z~ N(0,:1). 
& ja 
: Ny Ny 


When 2 is small 


x ) 
‘Test statistic T= oo where T'~ t(nj + 7,—2). 
é abivee 
My Ny 


Miscellaneous worked examples 


Example 11.16 
An inspector of items from a production line takes, on average, 21.75 seconds to — a 
item. After the installation of a new lighting system the times, ¢ seconds, to check each o: 
50 randomly chosen items from the production line are summarised by Z¢ = 1107, 
Le? = 24 $92.35. 
(a) Calculate an unbiased estimate of the population variance of the time taken to check an 
item under the new lighting system. 


(b) Test at the 5% significance level wh, 
has changed from 21.75 seconds. 


ether there is evidence that the population mean time 


A technician who carried out the above test concluded with the following incorrect 
statement. 


Give a corrected version. 


‘Tt is not necessary for the population to be normal since the sample size is large and the 
central limit theorem states that any sufficiently large sample is normal.’ (C) 


Solution 11.16 


Let T be the time, in seconds, to check an item. 


2 
(a) ae) 


n-1 n 

1 11077 
=—~ (24 592,35 -—__. 
rt 50 ] 


= 1.7014... 
= 1.70 (3 s.£) 


(b) Let « be the population mean time. 


A: «= 21.75 (the population mean has not changed) 
Ay: «# 21.75 (the population mean has changed} 


Since 7 is large, by the central limit theorem, T is approximately normal, and 
2 
T~N . =} 
n 
According to Hp, = 21.75. 


Since a? is unknown, 6? is used instead. 


= 1.70 
T ~ N(21.75, —— 
p75s0] 
Carry out a two-tailed test at the 5% level and reject Hy if |z|> 1.96 where z= ies A 
6NNn 
Xt 1107 
From the sample, #=— =—— = 22.14 
; nn 50 
22.14 - 21.75 7 
fe ee 14s Noi 
VL.70/¥50 eas 


Since |z|> 1.96, reject Hy. nr 1.96 


There is evidence, at the 5% level, that the population mean time has changed from 
21.75 seconds. 


The central limit theorem states that the distribution of means is approximately normal 
for large sample size n. 


NOTE: the variable in Example 11.16 was given as T. Do not confuse with the 
standardised statistic in the t-distribution. 


Example 11.17 


The random variable X is distributed N(u, 3.57). A test of the null hypothesis = 15 against 
the alternative hypothesis 4: > 15 is required and the probability of a Type I error should be 
0.05. A random sample of 30 observations on X is taken. 


(a) Find the critical region for the sample mean X. 
The mean of the sample was 16.00. 


(b) Find a 95% confidence interval for u. 
(c) Find P(Type II error) for the test in part (a) when = 17. 


The size of the sample is increased but P(Type I error) is still 0.05. 


(d) State what effect this change will have on the critical value for X and on P(Type Il error). 


(L) 


Solution 11.17 

P(Type I error) = P(Hy is rejected when Hy is true) = 0.05. 
So the significance level of the test is 5%. 
X ~ Ny, 3.52) 
Ho: w= 15 
Hy: a> 15 
According to Hy, X ~ N(15, 3.5”) 
So, for a sample of size 30, 
a 3:52 
X~wN [s, ai 
(a) Using a one-tailed (upper tail) test at the 5% level, reject Hp if z > 1.645 

where z= Bad : 

3.5/N30 
So the critical (rejection) region for X is given by 
X-~15 


> 1.645 
3.5/N30 


3.5 
je, B> 15 +1645 x= : 15 18.08 


XR > 16.05 (2 d.p.) critical region 


Ss 


95% confidence limits for yw: 


z+ 1,96 = 16.00 # 1.96 x 22 

® + 1.96 -——=16.00 + 1.96 x — 

Vn 130 
= 16.00 + 1.252... 

95% confidence interval =(16.00 - 1.252..., 16.00 + 1.252 ...) 
= (14.7, 17.3) (1 d.p.). 


(c) From part {a), Hh is accepted if % < 16.05. 


P(Type II error) = P(Hy is accepted when H, is true) 


P(Type I error) =f < 16.05 when X ~ fir al 


30 
nf OD 
3.5/N30 
= P(Z < -1.487) 
=1-0.9316 
= 0.0684 
So P(Type Il error) =~ 7%. 


If 1 is increased, but P(Type I error) 


= 0.05, the criti X wi 
Plliype erior) will else doses € critical value for X will decrease. 


This is illustrated in the following diagrams: 
When n= 30: 


: Ho 
pol 
under Ho under 4. 
¥- 3.5? i 
(15, 0) \ ¥~n(17, 38° 
' 
: \ 


Accept Hy 


When 1 > 30, the curves are more squashed: 


P(Type tl 


662 AL 


Example 11.19 | 


Example 11.18 
When cars arrive at a certain T-junction they turn either right or left. Part of a study of road 
usage involved deciding between the following alternatives. 
Cars are equally likely to turn right or left. 
Cars are more likely to turn right than left. 


(a) State suitable null and alternative hypotheses, involving a probability, for a significance test. 
(b) Out of a random sample of 40 cars, 2 turned right. Use a suitable approximation to find the 


The teats Dy : 
oe — ne in litres, in a randomly chosen two-litre can is denoted by the random 
- fen random observations of V are taken, with the following results. 


2.12 2.03 2.07 1.99 1,95 2.01 2.00 2.08 1.94 1.99 


Assumi istributi i 
oe a _ a normal distribution, use a single-sample t-test, at the 10% significance 
» to test the claim that the mean volume of paint in two-litre cans is not 2 litres 


(c) For the test described in (b}, calculate the probability of making a Type II error when, in 


least value of 2 for which the null hypothesis will be rejected at the 2% significance level. 


(C) 


fact, 80% of all cars arriving at the junction turn right. 


Solution 11.18 


Let X be the number of cars in 40 that turn right. Then X ~ B(40, p). 
(a) Hy: p= 0.5 


Ay: p> 0.5 
(b) According to Hy, X ~ B(40, 0.5) 
Now 1 is large and np = 40 x 0.5 = 20> 5, nq = 40 x 0.5 = 20> S. 
Since np > S, nq > 5, use the normal approximation 
X~N(np, npg), with npg = 40 x 0.5 x 0.5 = 10 
ie. XX ~ N(20, 10). 
Use a one-tailed (upper tail) test at the 2% level and reject Hy if P(X > n) < 2%. With the 
continuity correction this becomes P(X > 1 ~ 0.5) < 2%. 
-0,5-20 
ie. reject Hy if P}Z > eee le 0.02, 
V10 


From tables, P(Z > 2.055) = 0.02, 

n~-0.5—20 

so ———__——_ 

V10 
> 20.5 +2.055 x V10 

n> 26.998... 

Since x is an integer, least value of 2 = 27. 
Hh is rejected if X 2 27. 


(c) Hy: p = 0.5 
H,:p=0.8 
X ~ B(40, 0.8) 
np = 40 x 0.8 = 32>5, nq =40x 0.2 =8>5, 
so, using the normal approximation, X ~ N(mp, mpq) with mpq = 40 x 0.8 x 0.2=6.4 
X~N@2, 6.4) 
P(Type II error) = P(Hy is accepted when H, is true) 


> 2.055 


Hy is accepted if x < 27, ie. if x < 26.5 (continuity correction) oN 
so P(Type Ii error) = P(X < 26.5 when X ~ N(32, 6.4)) / \ 
26.5 - 32 \, 
=PiZ< ea. P(Type It error) Xs 
=P(Z < -2.174) | S. 
=1-0.9852 X: 26.5 32 
= 0.0148 Z: -2,174 Cy) 


1.5% 


are summarised by 
Lw = 38.1 and X(w — @)* = 0.060 40, 


icant wis a volume of paint, in litres, in a can. Use a two-sample ¢-test to test whether 

Si ak is 
ere is evidence, at the 10% significance level, that the new machine is dispensing less pai 
than the old machine. : ues 


State one condition required of the two populations for the two-sample t-test to be valid. (C) 


Solution 11.19 


Let V be the volume (in litres) of paint in a can. 

V~ N(u,, 07) with w, and o? unknown, 

Sample readings: 1=10, o= 2.018, s,2= 0.002976 (using calculator) 
An unbiased estimate of g? = 6?” sp= = (0.002976) = 0.0033 


6 =0.0575 
Ag: ey =2 
Hy: wy, #2 


n-1 


According to Hp, the standardised statistic T is such that 
arr and T~ (9), 
Carry out a two-tailed test at 10% level. 
Reject Hp if [¢|> 1.833 (see tables on Page 650, v= 9, p= 0.95). 
20183 
~ 0.0575/N10 


Since | ¢| < 1.833, do not reject Hh. 
There is not enough evidence to say that the mean volume is not two litres 


0.989 ... 


oe W be the volume, in litres, dispensed by the new machine. 
ssume ae W has a normal distribution and the two samples have a common population 
variance 0“, This time the two samples are considered to give a two-sample estimate of o” 


2 2 
- NySp +s 
g2 2 Si Fmsy° 


Ny+n,-2 
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Miscellaneous exercise lle 


where s? = 0.002 976 (from first part of question) 


and 


so 


2 uw wy? 
a: 
Ny$y = X(w - wD)" = 0.060 40 
, 10 x 0,002 976 +0.060 40 
- 10+20-2 
= 0.003 22 


6= 0.0567 (3 s.f.) 


By fo= i i he same amount) 

Ho: ty = 0 (the machines dispense the si 

Hy: ie ae > 0 (the new machine is dispensing less than the old) 
f 


The test statistic T= 


Carry out a one-tailed test, at the 10% level. Reject Hy if ¢> 1.313, where t 


Sw 38.1 
From the sample, # = 2.018, a ae 1.905 
he 2.018 - 1.905 =5.145... 
At, -. “ah 
0.0567 Tot 0 


vV-W-0 


jess 


Since t > 1.313, reject Hp. Se) 
rate is evidence at the 10% level that the new machine is dispensing less paint than the old 


machine. 


i i n variance. 
The condition required: The two populations must be normal with comin 


where 


where T ~ t(n, +n, ~-2) 


T ~ 1(28). 


1. The amount of nicotine, in milligrams, in a 


i i i IL 
cigarette of a certain brandis normally 
ddstributed with mean y and standard deviation 
2.5, A random sample of 10 cigarettes yielded a 
mean nicotine value of 18.4, Obtain a symmetric 
90% confidence interval for p, giving values to 
three significant figures. ; 

Give a reason why the value of « might not be 
inside this interval. : 

Test the null hypothesis 4 = 17.8 against the 
alternative hypothesis 4 + 17.8 at the 10% 


significance level. (C) 


2. A study is made of the numbers of boys and girls 


i ilies, A random sample of families is 
ee total number of children is 500, of 
whom 261 are girls, It is desired to test oe 
hypothesis that boys and girls are equally likely 
in the population against the alternative 
hypothesis that they are not equally likely. 


(a) State an assumption necessary for these 500 
children to be considered as a random 
sample of the population of all children. 

(b) Test at the 10% significance level whether 


the data indicate that boys and girls are " 


equally likely in the population. 


. A resident of an urban road claims that the 


average speed of vehicles using the age is 
greater than the 30 m.p.h. speed limit. ee 
investigate this claim the police time a ran en 
selected sample of 25 vehicles over a seas 
mile on the road. It is assumed that the spee A 
calculated from their observations come ie 
normal distribution with mean y m.p.h. ani 
standard deviation 12 m.p-h. 


(a) State appropriate null and alternative 
hypotheses for a significance test. 


(b) Find a critical region for a 5% significance 
test in the form, 


sample mean X > k, 


where the value of the constant k is given. 
correct to two decimal places. 

{c) State, with a reason, your conclusion for the 
test when the mean speed calculated from 
the sample was 35 m.p.h. 

{d) Calculate the power of the test when, in 
fact, 4 = 40. (NEAB) 


. A supermarket’s statistician reports that, over the 


past three months, the mean amount spent per 
customer has been £43 with a standard deviation 
of £20. 

‘The supermarket carries out a promotion for one 
week by offering ‘buy two ... get one free’ on a 
range of products which it sells. The management 
hopes that this will increase the mean amount 
spent per customer; you may assume that the 
standard deviation remains unchanged. 

A random sample of 50 customers visiting the 
supermarket that week spent a total of £2400. 


(a) Write down suitable null and alternative 
hypotheses in order to test whether or not 
the promotion has increased the average 
level of spending per customer. 

(b) Explain carefully the use of the Central limit 
theorem in carrying out this hypothesis test. 

{c) Carry out the hypothesis test at the 5% 
significance level, clearly stating your 
conclusion, 

(d) Find a 90% confidence interval for the mean 
amount spent by customers during the 
period of the promotion. State, giving a 
reason, whether this is consistent with your 
conclusion in (c). (MEI) 


. The process of manufacturing a certain kind of 


dinner plate results in a Proportion 0.13 of faulty 
plates. An alteration is made to the process 
which is intended to reduce the proportion of 
faulty plates. State suitable null and alternative 
hypotheses for a statistical test of the 
effectiveness of the alteration. 

In order to carry out the test, the quality control 
department count the number of faulty plates in 
a random sample of 2500. If 290 or fewer faulty 
plates are found then it will be accepted that the 
alteration does result in a reduction in the 
proportion of faulty plates. Calculate the 
significance level of this test, using a suitable 
normal approximation. 

Calculate the probability of making a Type I 
error in the above test, given that the alteration 
results in a decrease in the proportion of faulty 
plates to 0.11. (C) 


- Water from a cooling tower at a power station is 


discharged into a river. In order to test whether 
the mean temperature of discharged water is 
greater than the permitted maximum of 65 °C, 
the temperature (x °C) of 40 randomly selected 
samples of water will be taken and the sample 


mean ¥ used to test the null hypothesis y= 65 
against the alternative hypothesis 4 > 65, where 
# °C is the population mean temperature of 
discharged water. It may be assumed that the 
population standard deviation of x is 5.0, 


(a) State, in the context of the question, what 
you understand by 
(i) aTypel error, 

(ii) a Type Wf error, 

(b) The probability of a Type I error is fixed at 
0.1. Show that the range of values of % for 
which the null hypothesis is rejected is given 
by % > 66.01, correct to two decimal places, 

(c) State the conclusion of the test when 
X= 65.7, and the type of error that might be 
made in this case. 

(d) Calculate the probability of making a 
Type II error when, in fact, u= 68. 


What can be deduced about the probability of 
making a Type II error when, in fact, 4 > 68? (C) 


. The manager of a large supermarket wishes to 


judge the effect of a new layout on the 
customers, On the day that the layout was 
introduced the first 200 customers in the store 
were asked whether or not they approved of the 
new layout. 

Comment on the manner in which the sample 
was chosen, and suggest a way of obtaining a 
more suitable sample. 


Out of a suitably chosen sample of 200 
customers, 148 approved of the new layout, 
Calculate an approximate 95% confidence 
interval for the population percentage of 
customers who approve of the new layout. 

The supermarket manager claims that 80% of 
customers approve of the new layout. Show that 
the data provide evidence at the 23% significance 
level that the population percentage is less than 
80%, (C) 


. The randorh variable X has a normal distribution 


with mean ye (unknown) aad variance o2 

(known). To test the null hypothesis Hy: = Mhgy 

a random sample of observations of X is taken, 

and the sample mean is x. Find, in terms Of flo, 0 

and z, the set of values of ¥ which will result in 

each of the following: 

(a) Ho being rejected in favour of Ay: + ty at 
the 5% level of significance, 

(b) Hy not being rejected in favour of Ay: <py 
at the 1% level of significance, 


- The masses of components used in making a 


model car are being checked. Each of a random 
sample of 200 components is weighed and the 
masses, x g, are summarised by 


m= 200, Ex=1484.2, Ex=11098.19, 
(a) Calculate an unbiased estimate of the 
population variance. 


(b) State what you understand by ‘unbiased 
estimate’. 
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The components are produced in large batches. It 
is desired that the mean mass of components in a 
atch should be at least 7.40 g, In order to 


a random sample of 50 components from the 
batch is weighed. The sample data is used to 
perform a test of the null hypothesis 4 = 7.40 


1g is the mean mass of components in the batch. 
For the test the population variance is taken to 
ave the value found in part (a). The batch is 
rejected if the null hypothesis is rejected using a 
24% significance level. Show that the batch will 
rejected if the sample mean mass is less than 
7.22 g. 

For one such batch the sample data is 
summarised by n= 50, Lx = 366.0. 


Determine whether or not this batch is rejected. 
Calculate the probability of making a Type 
error in carrying out the above test for a batch 


whose mean mass is actually 7.10 g. {C) 


10. A box of dice contains some which are unbiased 


and some which are biased in such a way that 
the probability of throwing a six with one of 
these dice is 1. One die is selected at random 


from the box and, in order to decide whether it is 


biased, it is thrown 2.40 times and the number of 
sixes, N, is counted, The probability of throwing 
a six with this die is denoted by p. The null 
hypothesis p = { is tested against an appropriate 
alternative hypothesis at the 5% significance 
level. 


{a) State an appropriate alternative hypothesis. 

(b) Find the set of values of N for which it is 
accepted that the die is biased. 

(c) Find the probability of making a Type If 


error in the test. [@(3.355) = 0.9996] {C) 


14. A manufacturer makes two grades of squash 


ball: ‘slow’ and ‘fast’. Slow balls have a ‘bounce’ 
(measured under standard conditions) which is 


known to be a normal variable with mean 10 cm 


and standard deviation 2 cm. The ‘bounce’ of 
fast balls is a normal variable with mean 15 cm 
and standard deviation 2 cm. A box of balls is 
unlabelled so that it is not known whether they 
are all slow or all fast. 
Devise a test, based on an observation of the 
mean bounce of a sample of four balls from the 
box such that the Type I error is 0.05 and state 
the magnitude of the Type Il error for this test. 


12. An ambulance station serves an area which 
includes more than 10 000 houses. It has been 
decided that if the mean distance of the houses 
from the ambulance station is greater than ten 
miles then a new ambulance station will be 


necessary. The distance, x miles, from the station 


of each of a random sample of 200 houses was 


decide whether to accept or reject a batch each of 


against the alternative hypothesis 4 < 7.40, where 


{C) 


13 


14 


measured, the results being summarised by: 
Ex = 2092.0 and Ex? = 24 994.5. 


(a) Calculate, to four significant figures, 
unbiased estimates of 
(i) the population mean distance, 4 miles, 
of the houses from the station, 
(ii) the population variance of the distances 
of the houses from the station. 
State what you understand by the term 
‘unbiased estimate’. 
{b) Using the sample data, a significance test of 
the null hypothesis 4 = 10 against the 
alternative hypothesis > 10 is carried out 
at the a% significance level. In the test, the 
sample mean is compared with the critical 
value of 10.65; as the sample mean is less 
than 10.65 the null hypothesis is not 
rejected. Calculate the value of a. 
(c) Give a reason why it is not necessary for the 
distances to be normally distributed for the 
test to be valid. (C) 


A particular investigation concentrated on people 
recently re-employed following a first period of 
unemployment. Each of a random sample of 50 
such persons was asked the duration, in months, 
of this period of unemployment. A summary of 
the results is as follows. 


mean = 16.7 months, variance = 193.21 month? 


Investigate at the 5% level of significance the 
claim that, for people re-employed after a first 
period of unemployment, the mean duration of 
unemployment is more than 12 months. 

Indicate why, in carrying out your test, no 
assumption regarding the distribution of the 
duration of the first period of unemployment is 
necessary. (NEAB) 


. The error in the readings made on a measuring 
instrument can be modelled by the continuous 
random variable X which has mean » and 
standard deviation o. If the instrument is 
correctly calibrated then «= 0. 

In order to check the calibration of the instrument, 

the errors in a random sample of 40 readings were 

determined. These data are summarised by: 

Ex=120, Ex? =3285. 

{a) Estimate o?. 

(b) Carry out a hypothesis test, at the 5% level of 
significance, to test whether the machine is, or 
is not, correctly calibrated. You should state 
your hypotheses and conclusions carefully. 

(c) Obtain a symmetric 95% confidence 
interval for », explaining why it is only 
approximate. 

(d) Suppose the data from the 40 readings had 
been such that the estimate of o? as found ia 
part (a) was larger, but without changing 
the sample mean. State the effect this would 
have on the value of the test statistic in part 
(b). Explain why this might affect the 
conclusion to part {b). (MED, 


16. 


17. 


1S. A study of the annual rainfall, x cm to the 
nearest centimetre, over the last 20 years for a 
small town gave the following results: 


Ex= 1325, Bx? =90 316. 


{a) Find unbiased estimates of the mean and the 
variance of the annual rainfall for this town 


Archive records show that the annual rainfall for 


this towa, prior to this peri 

period, had a 
of 62.50 cm and a standard deviation of © = 
11.45 cm. 


{b) Assuming that the standard deviation 
remains unchanged at 11.45 cm, test at the 
5% level of significance whether or not 
there is evidence of an increase in mean 
annual rainfall over the last 20 years. State 
your hypotheses clearly. ° {L) 


In 1978 the Borsetshire County Council tree 
officer did a survey of a random sample of 64 
separate areas, each 1 km square, and found an 
average of 19.5 diseased trees per square. The 
following year, to test whether the disease had 
spread, she took a new random sample of 36 
separate areas, also each 1 km square, and found 
an average of 21.7 diseased trees per square. 


(a) Assume that, in both years, the number of 
diseased trees per 1 km square had a normal 
distribution with population variance 18.2 
Test, at the 1% significance level, the - 
hypothesis that the mean number of 
diseased trees per 1 km square in 1979 was 
the same as in 1978, against the hypothesis 
that the mean number had increased. ; 

{b} Further evidence suggests that the number of 
diseased trees is not normally distributed. 
Say what changes you might have to make 
if any, to the test you have carried out, : 
explaining the reasons for your answer. Do 
not carry out any further tests. (C) 


When watching games of men’s basketball, I 
have noticed that the players are often tall. 1am 
interested to find out whether or not men who 
play basketball really are taller than men in 
general. 


18, 


I know that the heights, in metres. 
general have the distribution N(1.73, 0,087). 


of men in 


make the assumption that the hei 

e heights, X 
of male basketball players are ale aren a 
distributed, with the same variance as the heights 


of men in general, but i . 
hice ge » but possibly with a larger 


(a) Write down the null and alternative 
hypotheses under test. 


I propose to base my test on the hei i 
eights of 
male basketball players who receltly Hee 


for our local team, and I shall yY 
sea een shall use a 5% level of 


(b) Write down the distribution of the sample 
“thee toe eee of size 8 drawn from 
distribution of X i 
resi kSniacscn assuming that the null 
(c} Determine the critical region for my test, 
illustrating your answer with a sketch. 
{d) Carry out the test, given that the mean 
height of the eight players is 1.765 m. You 
should present your conclusions carefully, 


stating any additional assumpti 
ion you 
to make. - aca 


In fact, the distribution of X is N(1.80, 0.06%), 


(e) Find the probability that a test based on a 
random sample of size 8 and using the 
critical region in part (c) will lead to the 
conclusion that male basketball players are 
not taller than men in general, (MEI) 


The ingredients for concrete are mixed together 
to obtain a mean breaking strength of 

2000 newtons. If the mean breaking strength 
drops below 1800 newtons then the coniposition 
must be changed The distribution of the breakin 
strength is normal with standard deviation . 
200 newtons. 


Samples are taken i i i 
s in order to investigate th 
hypotheses: . 


Ay: «= 2000 newtons 
A, : = 1800 newtons 


How many samples must be tested so that 
P(Type I error)) = 0.05 and 
P(Type I error) = 0.1? 
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Test 11A (z-tests) 


1. 


be 


Test 


1. 


Cans of lemonade are filled by a machine which 
is set to dispense a mean amount of 330 ml into 
each can. The manufacturer suspects that the 
machine is tending to over-dispense and, in order 
to test the suspicion, measures the contents, x ml, 
of a random sample of 30 cans. The results are 
summarised by: 


Ix = 9925, In? =3 284 137. 


(a) Calculate an unbiased estimate of the 
population variance of the amount 
dispensed into each can, Give four 
significant figures in your answer. 

({b) Test the manufacturer’s suspicion at the 
10% significance level. 

(c) Indicate where the central limit theorem is 
used in the test, and state why the use of the 
central limit theorem is necessary. (C) 


. The proportion of patients who suffer an allergic 


reaction to a certain drug used to treat a 
particular medical condition is assumed to be 
0.045, 

When 400 patients were treated, 25 suffered an 
allergic reaction. Using a normal approximation, 
test at the 5% significance level whether the 
quoted figure of 0.045 is an underestimate. (C) 


(a) A null hypothesis Ho is to be tested against 
an alternative hypothesis H,. Explain what 
is meant by: 

{i) a Type I error, 
(ii) a Type I error. 


11B (z-tests) 


A certain brand of mineral water comes in 
bottles. The amount of water in a bottle, in 
millilitres, follows a normal distribution of mean 
wand standard deviation 2. The manufacturer 
claims that sis 125. In order to maintain 
standards the manufacturer takes a sample of 15 
bottles and calculates the mean amount of water 
per bottle to be 124,2 millilitres. Test, at the 5% 
level, whether or not there is evidence that the 
value of is lower than the manufacturer’s 
claim. State your hypotheses clearly. (L) 


2. A newspaper headline stated ‘Majority would 


vote for Prime Minister’. The article explained 
that in a survey of 70 randomly selected people, 
38 had said that they would vote for the Prime 
Minister. A spokesman for the opposition party 
said that such evidence was inconclusive, and, 
according to standard statistical techniques, the 
result was consistent with only 40% of the whole 
population voting for the Prime Minister. 


(b) The tar yields in cigarettes of a particular 
brand are distributed normally with mean 
ye mg and standard deviation 0.8 mg. In 
order to test Hy: = 17.5 against 
H,: > 17.5 at the 1% level of significance, 
a random sample of 10 cigarettes of this 
brand is to be obtained and the sample 
mean X calculated. 

{i) In the case when the yields were 
recorded, in milligrams, as: 


7.1, 18.3, 18.9, 17.8, 16.9, 19.2, 
7.8, 18.3, 18.5, 18.2 


carry out the required significance test. 
(ii) Determine a critical region for the test 
in the form, X > c, where c isa 
constant whose value is to be 
determined. 

{iii) Calculate the size of the Type II error 


for this test when, in fact, = 18.0. 
{NEAB) 


4. The random variable X is distributed as 
N(u, 16). A random sample of size 25 is 
available. The null hypothesis 4 = 0 is to be 
tested against the alternative hypothesis u + 0. 
The null hypothesis will be accepted if 
~L5 << 1.5, where x is the value of the 
sample mean, otherwise it will be rejected. 
Calculate the probability of a Type I error. 
Calculate the probability of a Type I error if in 
fact 4 = 0.5; comment on the value of this 
probability. (MEI) 


A spokesman for the government stated that the 
results showed that 40% was too low. 

Stating the null and alternative hypotheses, test 
at the 5% level which of the spokesmen was 
justified in his assertion. (D) 


3. The playing times of a particular brand of audio 
tape are normally distributed with mean 
# minutes and standard deviation 0.24 minutes. 
The manufacturer states that 4 = 60. A large 
batch of these tapes is delivered to a store and, in 
order to check the manufacturer’s statement, the 
playing times of a random sample of ten tapes 
were measured. The null hypothesis 4 = 60 is 
tested against the alternative hypothesis x < 60 at 
the 1% significance level. 
(a) Find the range of values of the sample mean 
X for which the null hypothesis is rejected, 
giving two decimal places in your answer. 


aN 


Test 


{b) State what ‘a Type II error has occurred’ 
ty in the context of the playing times of 
(c) Calculate the probability of making a 
Type II error when, in fact, w= $9.7. {C) 


» The top 40 chart of the Recorded Music 


Association has been compiled every week for 
some years, and the standard deviation of the 
number of weeks which a record spends at 
Number One in the chart has been found to be 
0.87 weeks. The number of weeks which the last 
ten Number One records featuring female singers 
spent in the Number One position are: 


3,1, 1, 2, 1,2, 3, 2, 1, 1. 


11C (é-tests) 


- Six cleaning firms were selected at random and 


asked about their hourly rates of i 
the following results: al ae 


7.00, 6.80, 6.62, 6.94, 7.48 7 
[Sx= 41.88, Ex?=299.74] 7” me 


Carry out a t-test at the 1% signifi 

; ignificance level to 
borin iene the mean hourly rate of pay, 
paid by cleaning firms, falls bel 

minimam of $7.40. i eae 
State clearly an assumption made in i 

ttest in this context. “ee ie 


- Ona certain day in July the maximum 


temperature, 1 °C, was recorded at 11 points 
chosen at random on the island of San Marco. 
On the same day the maximum temperature, 
°C, was recorded at 20 points chosen at 
random on the island of San Polo. The results are 
summarised by: 


m= 25.30, p=26.45, 

Lm~ mm)? = 16.74, Wp—p)?=15.29, 

Test, at the 2.5% significance level, the claim 
that San Marco was cooler than San Polo on that 
day, giving your null and alternative hypotheses 


and stating any assumptions necessary ft 
; or your 
test to be valid. oer (C) 


- The contents of a packet of crisps are marked as 


30 g. The manufacturer believes that one of their 
machines is faulty and is issuing too many crisps 
per packet. A sample of 10 packets is selected at 
random from this machine and the masses of 
their contents were: 


31.5, 28.9, 30.5, 32.2, 35.5, 
34.2, 31,8, 32.8, 29.1, 32.17 
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For the last 15 Number One records featuring 
male singers, the data are: 


1, 1, 2, 2, 2, 3, 4,2, 1,2, 3, 5, 1,2, 3. 


A music industry producer wished to test 
whether there was any difference in the time 
spent at Number One between female and male 
singers. She assumes that both the distributions 
from which the two samples are drawn are 
normal with standard deviation 0.87 weeks. 


(a) State the null and alternative hypotheses she 
must use, 

{b) Carry out the test at the 5% level of 
significance. 

(c) Givea reason why her assumption of 
uormality may be invalid, (L) 


(a) Calculate an unbiased estimate of the 
population variance. 

(b) Is there evidence at the 10% level that the 
machine is issuing too many crisps per 
packet? State any distributional assumptions 
made. / 

(c) How would the test procedure in 

v h part (b) 
have differed if the population variance was 
known? (AQA) 


- The customers of a local branch of a bank are 


invited to comment on various aspects of the 
service. Their comments are translated into an 
overall ‘satisfaction score’. This score can be 
taken as normally distributed over the whole 
population of customers. 

A staff training programme has recently been 
completed. A random sample of scores before the 
programme was as follows: 


126, 93, 114, 107, 98, 112. 


A separate random sample of scores after the 
Programme was as follows: 


124, 107, 117, 136, 120, 122. 


Test at the 5% level of significance the null 
hypothesis that the mean score is the same after 
the training programme as it was before against 
the alternative that the new mean score is higher. 
stating your assumption concerning the : 
underlying variances. Provide a two-sided 99% 
confidence interval for the true difference in 
mean scores. (MET) 


The y? significance test 


In this chapter you will learn 


about the y? distribution 


e 


dom, v 
how to use x? tables and work out the number of degrees of freedo 


s 


@ how to perform a x” goodness-of-fit test for the following 
Test 1: a uniform distribution : 
Test 2: a distribution in a given ratio 
Test 3: a binomial distribution 
Test 4: a Poisson distribution 
Test 5: a normal distribution 


x vi ’ iB 8 y 
e how to perfo a test for depe dence between var ables, us a contingency table 
e about apply ig Yates’ continuity correction when v = 1 


ilities for the 
ey hee you will need to know bow to calculate probabilities fc 
ti (001 LO ‘ 
Hao binomial, Poisson and normal distributions. 


THE 72 SIGNIFICANCE TEST 


te as 2 significance 
There are two main situations when a 7 significan 
test is used: 


icular 
; one ow well a particuia 
a e i ed oe you have some practical data and you or ] sate oe il hr oohee 
me “4 ahdceibnion, such as a binomial or a normal, Sere wi sti im paaried 
His ne the particular distribution does provide a model for : 
0 = . . 
hypothesis H, is that it does not. 


i jation). sis 
- independence (or for associa ; diesen 
Bi Ax a is ‘a have some practical data concerning two Vania eae teal 
This is he ne hey are independent or whether there is an ene eerie oF act bey ee 
aie oH, i that the factors are independent; the alternative hypo 1 
hypothesis Hy is 
not. 


distribution or whether it is in the upper 
critical region is called the critical value. 
The critical value de 
Significance is used 
level of significance 
critical value is written X 5% 


ated. This is often written X2 


? distribution. Before look: 


proximated by a y 
d how to perform 


test statistic is calculated an 


ing in detail at how the 
X° distribution. 


the test, consider some of the features of the 


The x? distribution 


The x? distribution has one Parameter, v, pronounced ‘new’, and the shape of the distribution 
is different for different val 


ues of v. Here are some examples, 


Ax) Ax) 
4 v=l 


Some features of the y? distribution are: 


(a) It is reverse J-shaped for » = 1 and v=2, 
(b) It is positively skewed for » > 2. 

{c) The larger the value of Y; 
(d) When v is large, the distri 


the more symmetric the distribution becomes. 
bution is a pproximately normal. 


Degrees of freedom, v 


The parameter y is known as the number of de; 
independent variables used in calculating the t 


grees of freedom and it is the number of 
the following text and in the summary table oj 


est statistic. Details of how to find y are given in 
n pages 579 and 590, 


Critical values and levels of significance 


The x? test is conducted as a one-tailed ( 


upper tail) 
want to know whether the calculated va 


test. When carrying out the test, you will 
lue of the test statistic lies in the main bulk of the x 
tail critical (or rejection) region. The boundary of the 


pends on the level of significance of the test. Often a 5% 
and the critical values can be found from 


» the critical value is such that 5% of the 
(v), for a particular value of v, 


ora 1% level of 
x? tables. For example, fora 5% 
area is in the upper tail and the 


and, subject to certain 


562 4 
5% 
a — 
Faglv? 
rarer ce 
critical region 
If the test value lies in the critical region, then the null hypothesis Hy 
the alternative hypothesis H;. 
y* tables 


7 Ily se 
tables are usually : two fi 
Fith the format that you will be given in an exam: 


value of v. 
Format 1: y7 tables giving lower-tail probabilities 
In this format the column headings indicate the area in the lower t 


level, since there would be 5% in the upper tail, there would be 95 
> 
for the column headed 0.95. 


is rejected in favour of 


1 ke sure that you are familiar 
i o formats and you must ma 
ec treee ination. In each case the rows refer to the 


ail. For a 5% significance 
% in the lower tail, so look 


5% level 1% level 
: : 999 
0.01 0.025 0.05 0.9 0.95 0.975 0.99 0.995 0. 

: ; 5 7.879 10.83 
vei 0,0001571 0.0009821 0.003932 re ae ae ae te ie Ss 
, 0.05064 0.1026 fl 4. . ie 
; Rae 0.2158 0.3518 6.251 7.815 9.348 11.34 a ae 

y : 4844 0.7107 7.779 9.488] 11.14 13.28 i » 
: me ‘ The tables give 
the lower tail 


Examples (highlighted in the extract) 


(a) For a significance level of 5% and 4 degrees of freedom, 
the critical value, x7 50,(4) = 9.488. 


(b) For a significance level of 1% and 2 degrees of freedom, 
the critical value 7 4o,(2) = 9.210. 


Format 2: 7 tables giving upper-tail probabilities 


(This is the format printed on page 651.) 


Int 
level of the test. 


For example, the column heade 
the upper tail. 


i il, i ive the signi 
his format the columns indicate the area in the upper tail, ie. they giv gn 


probability 


ificance 


fe) 7, in 
d 0.05 gives the critical value such that 5% of the area lies i 


10% level 5% level 1% level 
l J J 


vy 0.990 0.975 0.950 |} 0.100 0.050 0.025 0.010 0,005 
1 0.000 0.001 0.004 || 2.705 3.841 5.024 6.635 7.879 
2 0.020 0.051 0.103 |! [e605 5.991 7.378 9.210 10.597 
30.115 0.216 ~~ 0.352 |! 6.251 7.815} 9.348 11.345 12.838 
4 


0.297 0.484 0.711 7.779 9488 11.143 13.277. 14,860 


x) The tables 
give the 
upper tail 
probability 


Examples (highlighted in the extract) 


(c) For a significance level of 5% and 3 degrees of freedom, 
25y(3) = 7.815, 


(d) For a significance level of 10% and 2 degrees of 
freedom, x” \y9,(2) = 4.605, iil 


value 


The test statistic X2 


The test statistic X? uses the values of the observed (O) 
(O-EY 


and expected (E) frequencies, 
X? is defined as 5 


The X? distribution can be used as an approximation for the distribution of X’ 2, provided 
none of the expected frequencies (E) falls below 5. 


You would write X? ~ y?(v), 

X? is calculated as follows: 

1. Find the difference, O - E, for each set of values. 

Die Square each difference to obtain (O — E)?. This gives due weight to any particularly large 
differences and also means that all the values are positive. 


ae a . (O-E)* 
3. Divide (O - E)? by E for each set of values to obtain ————~., 


This has the effect of standardising 
difference will be more imp 
expected frequency is large. 


(O-E)* : 
sea The smaller this quantity is, the better the fit. 


that element. In this way, for example, a small 
ortant when the expected frequency is small than when the 


4. Finally, find the sum, = 


The following example shows how to calculate X2 for given data in a goodness-of-fit test, 


PERFORMING A x” GOODNESS-OF-FIT TEST 


Random numbers consist of lists of the ten digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and are such that 
each digit has an equal chance of appearing at any stage. Each digit, therefore, has a 
= 0.1. This is the discrete uniform distribution 


probability of 0.1 of occurring, i.e. P(X = x) 
(see page 270). 


564 At 


SISE COURSE IM &-LEVEL STY 


By pressing the random number key [Ran#] on a calculator it is porsihle to generate a 
Be three-digit number between 0.000 and 0.999. For example 


Ran#] 0.593 [Ran#] 0.194 [Ran#] 0.106 and so on. 


In this case the random digits are 5, 9, 3, 1, 9, 4, 1, 0, 6 
Here are 100 digits generated on a calculator. 


3 


LONER G00 BOLO A 
NEE BR OO ON Ch OO WLOL0 
NEOGEO. ON Go RAAT. ON. 00 
WON RNR RNR 

WB 00. Xo: NW .90 Ge 
SENET BO B90: Go Ge 
DNO!. 90. NO. 00- DV BOD 
DARN 0 WE NE 
DNA BWM WA © 


lator are 
A x? goodness of fit test is used to test whether the numbers generated on the sesh ae 
ory enough. To make it easier to analyse the data, arrange the digits in a freq 


Digit 


10... Total 100 
Frequency 


Make null and alternative hypotheses as follows: 


Hg: the digits are random 
Hy: the digits are not random. 


Then calculate the frequencies that you would expect if the digits are random: 
Expected frequency for each digit is 100 x 0.1 = 10 


Add another row to the table so that the observed frequencies (O) and the expected 
frequencies (E) can be compared. 


Digit Op Take Qe eo Set ae SG ETN Bie go 


10... Total 100 
Observed frequency (O) 40° 1000. Zo 16. AZ 8 10: 11-212 


10° Total 100 
Expected frequency (BE) 10°10 10° 10° 10-10 10 10810. 


The frequencies can be illustrated by a vertical line graph: 


Distribution of 100 random digits generated on a calculator 


Expected frequencies 
Observed frequencies 


found from tables (see page 651) 


At first glance the observed frequenc 


y for the digit 3 seems much too high and that for the 
digit 0 seems much too low. 


The x? test compares each observed frequency with the corresponding expected frequency, 


Z (O-E)? ’ ted 
For each pair, calculate > then calculate the sum to give the test statistic 


x27 O-B 


If X? = 0, then there is exact agreement between the observed and expected frequencies. 
If X*>0, then O and E do not agree exactly; the larger the value of X2 the greater the 
discrepancy. A low value of X2 implies a good fit, whereas a high value of X2 implies a 
poor fit. 


For the above data, 


(4-10)? (10-10)? (7-10)? (40102 
X2= a =) 
io 4g ag on ee 


The calculations are usually summarised in a table: 


4 10 3.6 
10 10 0 
O-~E)* 
7 10 0.9 25 ( ) 
16 10. 3.6 E 
12 10 0.4 =9.4 
8 10 0.4 
10 10. 0 
11 10 0.1 
12 10. 0.4 
10. 10 0. 
ZO.= 100 LE = 100 9.4 
To decide whether the 


data give a good fit, you need to know whether 9.4 lies in the main 
body of the distribution or whether it is in the critical (rejection) region in the upper tail. If it 
lies in the critical region, reject Hp, 


The boundary of the critical region is found from the appropriate x? distribution which 


depends on the number of degrees of freedom, v, the number of independent variables used in 
calculating the test statistic. It is found as follows: 


v= number of classes — number of restrictions, 


The number of classes is 10 and there is one restriction 


(that the total of the expected 
frequencies is 100), sov=10-1=9, Consider the ¥2(9 


) distribution. 


Say that the test is to be carried out at the 5% significance level. The critical value, y? 5%(9) is 


#9) 
From tables, 4? 59,(9} = 16.919 
So Hy will be rejected if X? > 16.919. 


tatistic, 9.4, is less 
Since the calculated value of the test st a Rei oe a. x = 
than 16.919, it does not lie in the critical reg 
is not rejected. ; ore 
On the evidence obtained, you would accept 
digits are true random digits. 


5% 


LaRreraraRet 
critical region 


i ee ess-of-fit test: 
Summary of the procedure for performing a x goodn 


For a set of data with observed frequencies O: 


7 ake the ni Ly] (c ‘a are L rticular wa and the 
1 Make the null h pothi SIS Hy that the data ar distributed in a particular. Ly: 
alternative hypothesis H, that they. are not. 


iven in Hy. Note 
i i istribution follows the one given in Ho 
ncies expected if the distri ¢ oe 
a cara 2 Rene x small values of E tend to give a ae oe saa 
Cb we adopt the rule that expected une seas at Se cea Innes 
ine adjacent classes to Bite 
to he ae in the observed data also and make a revise 
ombine ¢ 


ere. 
3. Work out the number of degrees of freedom, v, wh 
= number of classes — number of restrictions 


i iti alue in tables. 
ide on the level of the test and look up the cee critical v 
Por puample for a 5% significance level, look up %° 50,(¥). 
2, 
iteri atv) 
Use it to state the rejection criteria: 


If X25 y7¢0,(v) then the test value lies in the critical 
jection) region. 
Seen between the observed and expected 


5% 


(2: 

i 8% 

IS 
frequencies is considered to be too great and Hy i a> 
rejected. . “io 

cs ee 
does not lie in the 

IE X? <770,(v) the test value 


critical region and Hy is not rejected. 


5%: 
(O-E)? 


4. Now calculate X° = oe E 


pe iy i is case 
vi inui rrection. In this 

h that ify = 1, itis advisable to use Yates’ continuity co: 

Note, however, : 

the formula is 


((O — El-0.5)? 


X=); E 


5. Compare the calculated vahie of x? 
rejected or Hy is not rejected) 


with the critical value. Make your conclusion (H, is 
and re! 
investigated, 


late it to the context of the situation being 


Note that when the value of X2 


is very small, it is wise to 
data. This is where the lower tai 


I (left-hand) probabiliti 
For example, suppose t! 


hat the test involves a yd 
of the test statistic is X= 0.7. 


query the reliability of the observed 
es might'be useful: 


(4) distribution and that the calculated value 


You can see from the tables o, 
hypothesis is true 


n page 651 that y?,.,(4) = 0.711, which means that if the null 
this would bé qui 


you would expect a value less than 0.711 from at most 5% of samples, so 
te rare. You might wonder whether the observed data have been fiddled: 


TEST 1: GOODNESS-OF-FIT TEST FOR A UNIFORM DISTRIBUTION 


Example 12.1 


The table shows the number of employees absent for just one day during a particular period 
of time. 


Day of the week Mon 


Tues Wed Thurs Fri 
Number of absentees 121 87. 87 91 114 Total 500 


(a) Find the frequencies expected according to the hypothesis that the number of absentees is 
independent of the day of the week. 


(b) Test at the 5% level whether the differences in the observed and expected data are 
significant. 


Solution 12.1 


L. State Hi, anil 


3. Work out» 


Ah: The number of absentees 


is independent of the day of the week, 
H,: The number of absentees 


is not independent of the day of the week, 
If the number of absentees is indep. 


endent of the day of the week then you 
would expect the total of 500 to b 


e spread uniformly throughout the week, 
Expected number of absentees for any day is 100. 


Mon Tues Wed Thurs 


Frt 
Observed 121 87. 87. 91 W400 SO=500 
frequencies (O) 
Expected 100 100 100 100 100. XE=500 
frequencies (E) 


Degrees of freedom v. 
There are five classes and there is one restrictions 


(ZE = 100). 
Therefore » = 5 — 1 = 4, so consider the ¥7(4) 


distribution. 


568 A 


Perform the test at the 5% level. : 
From tables 77 59,(4) = 9.488, so reject Hy if X?*> 9.488. 


(O-E) 
z : E Pa) 
121 100. 4.41 foe 
87 100 1.69 / om 
87 100 1.69 
- as a BABES 6 value 
114 100 1.96 eve 
x O='500 LE=500 10.56 
(O-E)° 


= 10.56 


Xe> 


Since X2 > 9.488, reject Hy. There is evidence, at the 5% ite that the number 
of absentees on a day is not independent of the day of the week. 


Note thai what the relationship mi e. The observed frequencies, 
indi he relationship might be. T r 
he test does not indicate what : 
how net ies a tendency towards a greater number of absentees on Mondays and Fridays 
> y' 


TEST 2: GOODNESS-OF-FIT TEST FOR A DISTRIBUTION IN A GIVEN 
RATIO 


Example 12.2 


i i i d blue, in 
According to a particular genetic theory the number of colour ees eas pian u 
ais flower should appear in the ratio 3:2: 5, In 100 random ae is eee ie 
corecpondlee numbers of each colour were 24, 14 and 62. Test at 


i ignificant. 
differences between the observed and expected frequencies are signif 


Solution 12.2 


a Hp: The colours are in the ratio 3: 2S, 
He H,: The colours are not in the ratio 3:2:5. 


i i :2:5,s50 
According to the null hypothesis, the colours should be in the ratio 3: 2:5, s 
the expected frequencies are 


pink 3;x100=30 white jx 100=20 blue {x 100=50 
id 


Colour Pink ‘White Blue 
62 yO=100 
Observed frequency (O) 24 14 — 
50 LE= 
Expected frequency (E) 30 20 
3. Work out ¥. Degrees of freedom v 


There are three classes and there is one neencHon & E = 100). 
Therefore vy = 3 - 1 = 2, so consider the ¥7(2) distribution. 


Perform the test at the 1% level, 
From tables X?49,(2) = 9.210, so reject Hi, if X? > 9,210. 


(O=Fy (O ~ E)? 
is} E Xray" 5 9g 
E Dy E 

24 30. 1.2 

14 20. 1.8 

62: 50 2:88 
LO= 100 ZLE=100 5.88 

: 9.210 
6 Since X? < 9.210, do not reject Hy. 


The differences between the obs 


erved and expected frequencies are not 


significant at the 1% level. The colours are in the ratio 3:2: 5. 


Exercise 12a 


1. A tetrahedral die is thrown 120 times and the 
number on which it lands is noted, 


seer 
Frequency 28 Total 120 


Test, at the 5% level whether the die is fair. 


2. From a list of 500 digits, the occurrence of each 
digit is noted. 


Digit OlL DEBE eg ge pg og 


acres pee 
Frequency 40° 58°49: 53.38 56 61 53 60 32 


Test, at the 1% level, whether the sequence is a 
random sample from a uniform distribution. 


3. The outcomes, A, B and C, of a certain 
experiment are thought to occur in the ratio 
1:2:1. The experiment is performed 200 times 
and the observed frequencies of A, B and C are 
36, 115 and 49 respectively. Is the difference in 
the observed and expected results significant? 
Test at the 5% level. 


4. According to genetic theory the number of 
colour strains, red, yellow, blue and white, ina 
certain flower shouid appear in proportions 
4:12:5:4. Observed frequencies of red, yellow, 
blue and white strains amongst 800 plants were 
110, 410, 150, 130 respectively. Are these 
differences from the expected frequencies 
significant at the 5% level? If the number of 
plants had been 1600 and the observed 
frequencies 220, 820, 300, 260, would the 
difference have been significant at the 5% level? 


(C) 


5. Itis thought that each of the 8 outcomes of an 
experiment is equally likely to occur, When the 


experiment is performed 400 times, the observed 


frequencies are 45, 42, 55, $3, 40, 62, 47 and 
56. Perform a test at the 1% level to investigate 
the validity of the theory, 


6. Ina particular subject students are set multiple 
choice questions each of which contain five 
alternatives A, B, C, D and E. A teacher suggests 
that when students do not know the correct 
answer they are twice as likely to choose one of 
B, Cor D than to choose A or E, For 160 
questions where it was known that the student 
answered without knowing the correct answer, 
A, B, C, D, E were chosen 23, 45, 36, 43 and 13 
times respectively. Is there evidence, at the 5% 
level, to support the teacher’s theory? 


7. Fora given set of data the observed and expected 
frequencies are shown: 


Result Py 2. 3 4 5 


Observed frequency. 30° 34°42 40°..57 


Expected frequency. 38. 45. 36 36.45 


Are the differences between the observed and 
expected frequencies significant at the 1% level? 


8. The random variable Y has a x? distribution with 
eight degrees of freedom. Find y, such that 
P(Y>y)=0.05. (L} 


During the course of one year a tutor marked 
111 assignments. The grades he awarded and the 
comparable national proportions are given in the 
table: 


Grade A B Gc D 


Number he awarded. 86 18 6 4 


National proportion 71%» 16% 7% 6% 


10. 


Calculate the expected numbers (to one decimal 
place) based on the national proportions. 

The x7 goodness of fit test requires the 
summation of terms of the form 


(O- EP 
E 


where O and E are observed and expected 
frequencies. Suggest reasons why 


a) the difference between O and E is used 
(b) this difference is squared, and 
(c) the squared difference is divided by E. 


Test, at the 5% level, whether there is any 
difference between the tutor’s and the national 
awarding of grades. State your conclusions 
clearly. (O) 


A calibrated instrument is used over a wide range 
of values. To assess the operator’s ability to read 
the instrument accurately, the final digit in each 
of 700 readings was noted. The results are 
tabulated below. 


Final digit Frequency. 
78 
63 
50. 
58 
73 
95 
96 
63 
46 
81 


o 


NODOSA EW 


Use an approximate ? statistic to test whether 
there is any evidence of bias in the operator’s 
reading of the instrument. Use a 5% significance 
level and state your null and alternative 
hypotheses. (L) 


11. The grades in a statistics examination for a 


group of students were as follows. 


Grade A Been Diss B 


Number of'students 14°. 1832.20. 16 


Test the hypothesis that the distribution of 
grades is uniform. Use a 5% level of significance. 
(L) 


12, An ordinary die is thrown 120 times and each 


time the number on the uppermost face is noted. 
The results are as follows: 


Number on die 1 2 3 AES: 6 
Frequency 14°16) 24.2224. 20 


Is the die fair? Test at the 10% level. 


13. Ina certain town an investigation was carried 


out into accidents in the home to children under 
12 years of age. The numbers of reported 
accidents and the ages of the children concerned 
are summarised in the table. 


No. of accidents 


Group Age of child (yrs) 


O0to<2 42 
2to<4 52. 
4to <6 28 
6 to<8 20 
8 to< 10 18 
10. to. < 12 16 


(a) State the modal class. 

(b) Calculate, to the nearest month, the mean 
age and the standard deviation of the 
distribution of ages. 

(c} Draw a cumulative frequency curve, and 
from it estimate, to the nearest month, the 
median, and the interquartile range for the 
ages of all children under 12 years of age 
concerned in reported accidents in the home. 
State, giving a reason, whether you consider 
the mean, the mode or the median best — 
represents the average age for accidents in 
the home to children under 12 years of age. 

(d) An investigator believes that children in the 
groups A, B, C, D, E, F are likely to have 
accidents in the home in the ratios 
2:2:1:1:1:1 respectively. Use a x? test 
at a 5% significance level to decide whether 
or not this belief is justified. () 


TEST 571 


TEST 3: GOODNESS-OF-FIT TEST FOR A BINOMIAL DISTRIBUTION 


Example 12.3 


A farmer kept a record of the m 


umber of hei : 
years of breeding of the cow of heifer calves born to each cow during the first five 


he results are summarised in the table: 
Number of heifers 0 1 


2 3 4 5 
41 52 26 8 


Number of cows 4 19 


Solution 12.3 
1. State F 


Hy and (a) Let X be the number 
breeding. 

Hy: X ~ B(S, 0.5) 
Hi: X is not distributed in this way. 


of heifer calves born to a cow in the first five years of 


To calculate the binomi 


which give P(X <x). 31 probabilities, use cumulative probability tables 


Alternatively, calculate the probabilities using 


P(X =x) = 5C,(0.5)5-*(0,5)* = 5C,(0.5)5 


The total number of cows is 150, so the 
multiplying P(X = x) by 150. 
Note on accuracy: When cal, ing it i 
; culating it T 
air ee oe 1g it is often necessary to.approximate, say to 


imal place. If you have me iliti 
07 - mory faciliti 
your calculator for retaining several numbers then you may re to na 


expected frequencies are found by 


Using tables: 
(Extract from page 645) 


P(X =x) | £=150x P(X =x) X ~ B(S, 0.5) 
P(X = 0) = 0.0313 4.7 PO 
P(X = 1) = 0.1875 - 0.0313 = 0.1562 be 

P(X = 2) = 0.5 ~ 0.1875 = 0.3125 46.9 

P(X = 3) = 0.8125 - 0.5 = 0.3125 46.9 

P(X = 4) = 0.9688 ~ 0.8125 = 0.1563 Ba 

P(X = 5) = 1 - 0.9688 = 0.0312 al 


Check on size of expected frequencies: 


Since the expected requencies for the first and last c! asses are less than 5 
: : 
combine them with the next classes, 


: di 
Do a revised table for the expected frequencies and also show the corresponding 


observed frequencies: 


4or§ 
Number of heifers x Oort 2 3 —— 
34 = 
Observed frequencies (O) 23 41 me ve se tet 
Expected frequencies (E) 28.1 46.9 7 
3. Work out Degrees of freedom v : ae _ 
a There are Hite classes and there is one restriction (ZE = 150). 
Therefore v = 4 ~ 1 = 3, so consider the y7(3) distribution. 
Perform the test at the 5% level. ; — 
From tables 7 ,,(3) = 7.815, so reject Hy if X* > 7.815. 
(O-E)* 
2 =346 (dp) 
E x -¥ ; 
28.1 i 


46.9 
46.9 
28.1 


ZO= 150 EE=150 


6. Make your Since X? < 7.815, do not reject Ho. 


conclusion, 


data, 


7.815 


The binomial distribution with 1 = 5 and p = 0.5 is an adequate model for the 


i ith 
(b) If you want to test whether the distribution B(5, p) provides an adequate model, with p 


f wv i mean of a 
unspeci ted. you v ould need to estimate p from the data using the fact that the 
? 


binomial distribution is mp. 
From the data 
x gee 5 ae: 
Lf 105 
Since 
x=np 
2.673 = 5p 
p=0.535 Bdp) 


we ~ i would 
S the null hypothesis ould be Hg: x Bis, 0.535) and the expected frequencies 
fe} 


be calculated using p = 0.535. 


en Wi ig out vy, Tr legrees of freedom, you would take into accoun 
8 oY ; 

Whi orkir ut ‘he number of d d if 

here are now two restrictions, one is that LE =150 {as be: ore) ar d the other is that p is 


estimated from the sample. 


i = 2 ~ 0.85 (depending 
Try working through this test. You should find that v= 5-2 =3, ct 0.85 ( 
Sede of approximation in your calculations) and Hy not rejected. 


t that 


——— 


TEST 4: GOODNESS OF FIT FOR A POISSON DISTRIBUTION 


Example 12.4 


An analysis if the number of goals scored by the local football team 


gave the following results; 


Goals per match (x) 0 1 2 3 40 s 


6 


7: 


Number of matches 14 18 29. 18 10 7. 


3 


1 


Total 100 


Carry out a y* goodness of fit test at the 10% significance level to determine whether or not 
the above distribution can be reasonably modelled by a Poisson distribution with parameter 2. 


Solution 12.4 


1. State Hy and 


Let X be the number of goals scored in a match. 
Hy: X ~ Po(2) 


Hy: X is not distributed in this way. 
give P(X <x). 


Alternatively, calculate the probabilities using 


piss 
P(X=x)=e? =, 
x! 


To calculate the Poisson probabilities, use cumulative probability tables which 


The total number of matches is 100, so the expected frequencies are found by 


multiplying P(X = x) by 100. 
Using tables: 


(Extract from page 647) 


X ~ Po{2) 
P(X =x) E=100P(X=3) 322.0 | PiX<a) 
P(X = 0) = 0.1353 13353 r=0 0.1353 
P(X = 1) = 0.4060 - 0.1353 = 0.2707 27.07 1 0.4060 
P(X = 2) = 0.6767 — 0.4060 = 0.2707 27.07 2 0.6767 
P(X = 3) =0.8571 ~ 0.6767 = 0.1804 18.04 3 0.8571 
P(X = 4) = 0.9473 — 0.8571 = 0.0902 9.02 4 0.9473 
P(X = 5) = 0.9834 — 0,9473 = 0.0361 3.61 5 0.9834 
P(X = 6) = 0.9955 — 0.9834 = 0.0121 £21 6 0.9955 
P(X =7 or more) = 1 - 0.9955 = 0.0045 0.45 7 0.9989 
LE =100 8 0.9998 
; . 9 1.0000 
Check on size of expected frequencies: 10 
The x* test is not valid for expected frequencies less than oF 1 
so combine the last three classes to give 5 or more goals, 
Revised table: 
x 0 1 2 3 4 S or more 
oO 14 18 29 18 10: 11 “ZO:= 100 
E 13.53 27.07 27.07 18.04 9,02. 5.27 LE = 100 


3. Work our, Degrees of freedom v 
There are six classes and there is one restriction (LE = 100). 


Therefore vy = 6 -1=5, so consider the y?(5) distribution. 


Perform the test at the 10% level. 
From tables ¥749,(5) = 9.236, so reject Hy if X? > 9.236 


4. State the level 
of the test anc 
the rejection 


criterion, 


5. Calculate X?. 6 . (O- EP Seas 
E X=> (0-8) 9.53 (2d.p.) 
14 13.53 0.016 ... E 
18 27.07 3.038 ... ; 
29 27.07 0.137:.. x) 
18 18.04 0.000... 
10 9.02 0.106 ... ee 
1 $.27 6.230... 
xO = 100 XE = 100 9.529... 9.236 
6. Make your Since X? > 9.236, reject Ho. 


conclusion, The number of goals per match cannot be modelled by a Poisson distribution 


with parameter 2. 


Example 12.5 
Can the data of Example 12.4 be modelled by a Poisson distribution having the same mean as 


the observed data? Test at the 10% level. 


Solution 12.5 


For the observed data, ¥ = vf “Too 72> 
The null hypothesis is that the distribution is Poisson, with parameter 2.3, 
ie. Hy: X ~ Po(2.3) 
H,: X is not distributed in this way. 
The probabilities are found from cumulative tables or by calculating using 


P(X =x) =e SS x=0,1,2,.. 
xt 


The expected frequencies are given by 100 x P(X =x). 


THE 22 oimpye 
THE 2? SIGNIFICANCE 


Using the formula: 


P(X =x) to 4 dp. E= 100P(X= 
le nies 
P(X =0)=0.10025 ... 10.03 
P(X = 1) = 0.2306 23 06 
P(X = 2) = 0.2652 26.52 
P(X = 3) = 0.2033 20.33 
P(X =4)=0.1169 11.69 
P(X = 5) = 0.0538 338 
P(X = 6) = 0.0206 2.06 
P(X =7 or more) = 0.0099 0.99 oe ne 
¥ ater than 5, 
LE = 100 
Revised table: 
: . : 2 3 4 Sor more 
: a 29 18 10 it ZO = 100 
i 23.06 26.52. 20.33 11.69 8.43 XE = 100 


Degrees of freedom y: 
There are six classes, 


There are two restrictions; 


Therefore v = 6 —2 = 4, so consider the x7(4) distribution, 


oe the test at the 10% level, 
tom tables ¥7:99,(4) = 7.779, so reject Hy if X? > 7.779 


Calculating x2 
oO B (O=E) 
E 2_~ (O-E)? 
14 10.03 isvt.. | ~ “2, E4208 Gdp.) 
18 23.06 1.110... 
29 26.52 0.231... (4) 
18 20.33 0.267... ee 
10 11.69 0.244... 
i 8.43 0.783... 10% 
20 = 100 EE = 100 4.208 ... tS 


Since X? < 7.779, do not reject Hy. 


© number of goals per m: y 
‘ : 
Be iene . p atch can be modelled b a Poisson distribution with the same mean 


| 
| 


TEST 5: GOODNESS-OF-FIT TEST FOR A NORMAL DISTRIBUTION | a: eee Perform the test at the 5% level 
| From tables ¥7,.,(4) = 9.488, so ee Hy if X2> 9.488 
Example 12.6 


The height, in centimetres, gained by a conifer 
random variable X. The value of X is measure 


in its first year of planting is denoted by the : xe, 
d for a random sample of 86 conifers and the ‘a _ ©_B? a 


results obtained are summarised in the table: / E X2= ss =2.61 Qdp) 
| 10 13.7 0.99 E ‘P- 
i \ 3 DID iis 
x <35 35-45 45-55 55-65 >65 18 18.1 0.0005... 
Observed frequency 10 18 28 18 42 28 22.4 1g es 
18 18.1 0.0005 ..; 
(a) Assuming that X is modelled by a N(50, 15) distribution, calculate the expected a 13.7 0.210... : 
frequencies for each of the five classes. EO = 86 = 
(b) Carry out a 7” goodness of fit analysis to test, at the 5% level, the hypothesis that X can ae 2 : LE = 86 2.611 .., 
‘ceshicdiclled ax ini (a): (C) . es a Since X* < 9.488, do not reject Hy. 
conclusion. The normal model N(50, 152) is a suitable model, 


Solution 12.6 


(a) X ~ N(SO, 15°) 
Standardise each X value, 


Example 12.7 


Y 
1 
! 
1 
A weaving mill sells len: 
| 
i 
1 


1 
1 
' 
' 
f cloth wi i 

| gths of cloth with a nominal length of 70 m. A customer measured 100 
| 

' 


i 
e.g. when x = 35 ' | lengths and obtai F : 
i x- 35-50 ‘ } t gths and obtained the following frequency distribution: 
[== ars ail | | ey Length (m) 61 
15 = e 1-67 67-69 7 
4 x: 35 45 55 65 : 69-71 71-73 73-75. 75-81 
Notice that there is symmetry in the diagram. Zz -1-0.333 0.333 1 Fequency 1 16 26 19 36 a 
Probabilities E = probability x 86 Use a x? test at the 5% level ignifi 
of significance to show that the istributi 
normal dist i 
P(X < 35) = P(Z <-4) = 1 - 0.8413 = 0.1587 13.7 sedate model forthe dace, Paste “TAEB 
P35 <X <45)=P(-1 <Z <-0.333) = 0.8413 ~ 0.6304 = 0.2109 18.1 - ( ) 
Solution 12.7 
P(AS < X < 55) = P(-0.333 <Z< 0.333) = 2 x 0.6304 - 1 = 0.2608 22.4 


seri The null hypothesis is that the distribution is n 


P(SS <X<65)=0.2109 (by symmetry) petal baliates sence die heey xe: fs 


(P > 65)=0.1587 (by symmetry) 13.7 variance is given, they have to be estimated from the data 
LE = 86 Mid-interval value x 64 68 70 7 A 
Note that the expected frequencies have been given to 1 d.p. Prequeacy : ' = nae : 78 
. 26 19 20 

(b) x? goodness-of-fit test: From the calculator 18 
‘1. State Hy and Ho: X ~ N(50, 15°) Paes 
Hy. H,: X is not distributed in this way. a %=72.24 (see page 32) 
5 #=11.578 (3d.p.) (see page 449) 

x. <35 35-45 45=55 55-65 >65 : 

a x goodness-of-fit test 
€ oO 10 18 28 18 12 LO = 86 Hy: X ~ N(72.24, 11.578) 
6 Ay: Xi istri in thi 
; E 13.7 18.1 224 18:1 13.7 LE= 86 1: X is not distributed in this way. 
g ; Standardise the b 

(Note that all expected frequencies are greater than S so there is no need to ; e boundary va of the 

combine classes) intervals (to 3 d.p.) using z= XM _*~ 72.24 
5 ee ¢ 11.578 
3. Worle out», Degrees of freedom v er -6 61-72.24 

There are five classes and one restriction (LE = 86). en x= 61, Te Agree! -3.303, . —41t 

1.578 x61 . 6769717375 Bl 


Therefore v= 5—1=4, so consider the ¥7(4) distribution. : 
NOTE: P(X < 61) = P(Z<-3.303)— 0, s0 take the first class as X < 67, 


3, Work out». 


4, State the level 


ertterlon. 


5, Calculate X*. 


6. Make your 


conclusion. 


E=prob x 100 
Probabilities 
=1-0,9382 = 0.0618 
< 67) =P(Z <-1.540) =1-0. : a 
ae < uae 69) = P(-1.540 <z< 0.952) = 0.9382 — 0.8294 seca ak 
P(69 << X<71)=P(-0.952<Z< 0,364) = 0.8294 — 0.6421 = 0. 


-i= 4 22.54 
P(71 < X <73) = P(-0.364 < Z < 0.223) = 0.6421 + 0.5883 oe 
= 0.811) = 0.7913 ~ 0.5883 =0. 
P(73 < X<75) = P(0.223 <Z< ¥ 
He <X <81)=P(0.811 < Z < 2.574) = 0.995 — 0.7913 = 0.2037 
P(X > 81) = P(Z > 2.574) = 1 - 0.995 = 0.005 


Combine the last ewo classes to give X > 75, 


<67.. 67-69. 69-71 71-73 73-75 75 and over 
; aa 
16 26 19 20 18 ZO = 100 
s : 20.8 21.42 XE = 100 


E 6.18 10.88 18.73 22.54 


Degrees of freedom v 
There are six classes. 
There are three restrictions: 


@ LE=100 . 
@ The mean of the normal distribution has been estimated from the data. 


The variance of the normal distribution has been estimated from the data. 
: . . . 
Therefore v = 6-3 = 3, 80 consider the ¥7(3) distribution. 


form the test at the 5% level. ; 
Fre als ¥259,(3) = 7-815, so reject Hy if X? > 7.815 


; 
(O- 5? hs 
° E E eT (CO-E) 10.7 (dp) 
1 6.18 4.341... 

16 10.88 ZAI... 20) 

26 18:73 2.821. Ls 

19. 22:54 O:555.s5. ~N ss 

20 20.8 0.030... 

18 21.42 0.5463: Sale 
LO'= 100 LE = 100 10.705... 


Since X? > 7.815, reject Ho. m 
The normal distribution is not an adequate model for the dai 


Summary of the number of 
tests 


ICANCE TEST 579 


degrees of freedom for goodness-of-fit 


Exercise 12b ? goodness-of-fit tests for binomia 


distributions 


1. Perform a y? test to investigate whether the 
following is drawn from a binomial distribution 
with p = 0,3. Use a 5% level of significance. 


Distribution v 
Uniform =n=1 
Given ratio an 
Binomial (a) ifp is known ana 
(b) if p is unknown and it is estimated from the ven-2 
observed frequencies using X= np 
Poisson (a) if is known ven—1 
(b) if 2 is unknown and it is estimated from the ven-2 
_ observed frequencies using x =A 
Normal (a) if and o? are known ven-1 
(b) if and o? are unknown and are estimated pen—3 
from the observed frequencies 


, Poisson and normal 


A new fly spray is applied to 50 samples each of 
five flies and the number of living flies counted 
after one hour. The results were as fol lows: 


Number living 0) 4 Qe 3 an eg 


x 0. 1 2 3 4 s 
E- 12 39 27 15 4 3 


Frequency. 720129 Tyr 


2. A six-sided die with faces numbered as usual 
from 1 to 6 was thrown five times and the 
number of sixes was recorded. The experiment 


was performed 200 times, with the following 
results: 


x 20 1 2 3 4 5 
Frequency: 66° 82.40 10 2 0 


On this evidence, would you consider the die to 

be biased? Fit a suitable distribution to the data 

and test and comment on the goodness of fit. 
(MEI) 


- Under what circumstances would you expect a 
variable, X, to have a binomial distribution? 

What is the mean of X if it has a binomial 

distribution with parameters 7 and p? 


Calculate the mean number of living flies per 
sample and hence an estimate for p, the 
probability of a fly surviving the spray. Using 
your estimate calculate the expected frequencies 
(each correct to one decimal place) 
corresponding to a binomial distribution and 
perform a ¥? goodness-of-fit test using a 5% 
significance level, 


4. Two dice were thrown 216 times, and the 


number of sixes at each throw were counted, The 
results were: 


No. of sixes 0: 1 2, 


Frequency 130 76: 10 


‘Fotal216 


Test the hypothesis that the distribution is 
binomial with the parameter p=, 

Explain how the test would be modified if the 
hypothesis to be tested is that the distribution is 
binomial with the parameter p unknown. (Do 
not carry out the test.} {O) 


5. Smallwoods Ltd. run a weekly football pas 
competition. One part of this involves a fixed- 
odds contest where the entrant has to forecast 
correctly the result of each of five given matches. 
In the event of a fully correct forecast the entrant 
is paid out at odds of 100 to 1. During the ae 
two years Miss Fortune has entered this fixed- 
odds contest 80 times. The table below 
summarises her results. 


Number ‘of entries 
with x correct 
forecasts (f) 


Number: of matches 
correctly forecast 
per entry: (x) 


0 8: 
: 19 
5 25 
5 22 
4 5 
c 1 


(a) Find the frequencies of the number of 
matches correctly forecast per entry given by 
a binomial distribution having the same 
mean and total as the observed distribution. 

(b) Use the x? distribution and a 10% level of 
significance to test the adequacy of the 
binomial distribution as a model for these 
data. : 

(c) On the evidence before you, and assuming 
that the point of entering is to win money, 
would you advise Miss Fortune to continue 
with this competition and why? (AEB) 


i ly from a 
6. Samples of size S are selected regular y 
Sroductied line and tested, During one week 500 
samples are taken and the number of defective 
items in each sample is recorded. 


Number of & 
defectives, x. 0 L 2 3 # 


Frequency, f° £70. 180. 120... 20. 852 


i i i del, with 
It is suggested that a binomial model, 

3 mean the same as the observed data, can be 
used. Find the frequencies expected by this 
model. ; ; 

(b) Test whether tkis binomial model is a goo! 
one. Use a 5% level of significance. 


7. A group of students are performing an 4 
experiment where 20 drawing pins are eeeppee 
randomly on to the floor and the number landing 
point down is counted. The procedure is then 
repeated several times. Describe the assumptions 
you would need to make in order to be satisfied 
with modelling this situation by a binomial 
distribution. The experiment was carried out 
until the students had 50 observations; their 
results are given in the table: 


Number landing 
point down Frequency. 
3 2. 
4 2 
5 5 
6 i 
7 17 
8 8 
9 6 
10 1 
tt 2 


(a) Calculate the mean number landing point 
down. Hence show that an estimate for the 
probability of a drawing pin landing point 
down is 0.35. - 

(b) What are the parameters of the appropriate 
binomial distribution for these data? ; 
Calculate the probability of exactly eight 
landing point down, and hence write down, 
accurate to one decimal place, its expected 
frequency. , 

(c) Using copeopHlais tables, find, making your 
method clear, the expected number of times 
five or fewer pins would land point down. 

(d) The chi-squared goodness-of-fit test can be 
used to judge how well data follow a 
distribution. Group the above data in the 
following manner and evaluate the missing 
expected or observed frequencies: 


Number of pins’ <5 6 7 8 29 
Expected 8.6 11.8 
Observed 9 7. 17 


Calculate the value of the chi-squared statistic 

for this data. 

(ec) How many degrees of freedom does your 
test have? By referring to your tables carry 
out the test and make your findings clear. 


i \f 
focal council has records of the number o 
, allies and the number of households in its 
area. It is therefore known that the averse 
number of children per household is 1.40. 
suggested that the number of children pet 
household can be modelled by the iota ieee 
distribution with parameter 1.40. In ie vs 7 
this, a random sample of 1000 househo! 
taken, giving the following data. 


Number of 
children vi Lies é 


Number of 


4 2t 
households. 273 28 


(a) Find the corresponding expected frequencies 
obtained from the Poisson distribution with 
Parameter 1.40, 

(b) Carry outa x7 test, at the 5% level of 
significance, to determine whether or not the 
proposed model should be accepted. State 
clearly the null and alternative hypotheses 
being tested and the conclusion which is 


reached. (MEI) 


9. The numbers of cars passing a check-point 


during 100 intervals, each of time 5 minutes, 
were noted; 


Number of cars Frequency 
0. 5 
ft 23 
2 23 
3 25 
4 14 
s 10 
6 or more 0 


Fit a Poisson distribution to these data and test 
the goodness of fit. 


10. During the weaving of cloth the thread 


per thread OnE 2 
ee? 
Number of threads 48° 46° 30° 42 93.2 


sometimes breaks. 147 lengths of thread of equa 
length were observed during weaving and the 
table records the number of these threads for 
which the indicated number of breaks occurred. 


Number of breaks 
Bo 4 § 


Number of Number of stations (A 
rainstorms (x) reporting:x rainstorms 

0 102 

1 114 

2 74 

3 28 

4 10 

5 2 

more than 5 0 


(a) Find the expected frequencies of rainstorms 
given by the Poisson distribution having the 
same mean and total as the observed 
distribution, 

(b) Use the y? distribution to test the adequacy 
of the Poisson distribution as a model for 
these data. (AEB) 


13, Over a period of 50 weeks the numbers of road 


accidents reported to a police station are shown 
in the table below. 


No. of accidents 


No. of weeks 


Find the mean number of accidents per week. 
Use this mean, a 5% level of significance, and 
your table of x? to test the hypothesis that these 
data are a random sample from a population 
with a Poisson distribution, (OGC) 


14, (a) The data in the following table are the result 


of counting radioactive events in five-second 
intervals: 


1. 


12. 


Fit a Poisson distribution to the data and 
examine whether the deviation between theory 
and experiment is significant. (MEI) 


A shop that repairs television sets keeps a record 
of the number of sets brought in for repair each 
day. The numbers brought in during a random 
sample of 40 days were as follows, 


4000211000 0110300010 
4000002010 0001110200 


Test, at the 5% significance level, the hypothesis 
that these numbers are observations from a 
Poisson distribution. (C) 


The table gives the distribution for the number of 

‘cavy rainstorms reported by 330 weather 
Stations in the United States of America over a 
one-year period. 


Number of events 0 1 2.. 33 


Number of observations . 5° 14 13... 8 


Show that the mean number of events ina 
five-second interval is 1.7 (taking the group 
with frequency 8 to have a mean of 3.5 ). 

(b) Write down the probability of 0, 1, 2, >3 
events for a Poisson distribution with mean 
1.7, Hence obtain to one decimal place the 
expected frequencies, 

(c) Use the chi-squared goodness of fit test to 
assess whether it is reasonable to claim that 
the data come from a Poisson distribution. 
Make your method clear and conduct your 
test at the 10% level. 

(d) A student conducting a similar experiment 
found the chi-squared statistic for his results 
was 0.015. What conclusions do you draw 
from this value? (O) 


| THE y 


583 


15, For a period of six months 100 similar hamsters (a) Test, at the 5% level, whether the data 
were given a new type of feedstuff. The gains in follow a normal distribution with mean | 
mass are recorded in the table below: 173.5 cm and standard deviation 7 cm. i 


(b) Find the expected frequencies for a normal 
Gain in mass (g) x Observed frequency'f distribution having the same mean and Candidate | 
; r variance as the data given, and test the A ‘a 
=e <x<-10 3 goodness of fit, using a 5% level of B 
“Oe ee 25 6 significance. = 48-25 373 6 
a ° 
55x s0 2 17. Ina European country registration for military ey g 25-40 484 187. 
O<x<3 1s service is compulsory for all eighteen-year-old <2 40-60 
y 167 563 
S<x<10 24 males, All males must report to a barracks Over 60 00 
10<x<15 16 where, after an inspection some people, including Th 1 492 
all those less than 1.6 m tall, are excused service. isisa4b 2 conti 
Ix 520 14 ‘The heights of a sample of 125 eighteen-year- y ntingency table (4 rows and 2 columns). 
2<x< 25 8 olds measured at the barracks were as follows: You can use a 72 3 ; 
25 <x <30 ; 3 - Pppaehert x" test to investigate whether the two factors are ind 
30 <x<e0 2 Height,m 1.2— 14- 16- 18-  2.0-2.2 a ie between them. The test follows a similar patte in ea or whether there 
ime the null hypothesis Ay is that the two factors are aden de ees OF fit test, 


Fi 6 34. 31 42 12 j : ‘ 
requency hypothesis His that there is an ass independent and the alternative 


It is thought that these data follow a normal 


ociation between them, 


distribution, with mean 10 and variance 100. la) U 5 da 5% signifi level The foll 
Use the x? distribution at the 5% level of a) Use a x° test and a 5% significance level to € following example exp]. 
fae ‘ 7 coat aac ai . all 
significance to test this hypothesis. confirm that the normal distribution is not contingency table Plains how to calculate the expected frequencies for data given j 
i nina 


an adequate model for this data. 


Describe briefly how you would modify this test 
b) Show that, if the second and third classes 


if the mean and variance were unknown. (AEB) ( 


5 i 
(1.4- and 1.6-) are combined, the normal Example 12 | 
16. The following data give the heights in ceebanes dies appear to fit the tek : P | 
centimetres of 100 male students. ommient on this apparent contradiction in The 3 
the Kght of the information at the beginning eae a a 8 team are interested in whether the weather h Ff | 
» ‘hey play 50 matches, T Aas an effect on their 
i 


of the question. (AEB) 


with the following results 


Height (cm) Frequency 


155-160 
161-166 
167-172 
173-178 
179-184 
185-190 


Weather i 
Good 7 


THE y? SIGNIFICANCE TEST FOR INDEPENDENCE 


Sometimes situations arise when data are classified according to two different factors or 
attributes and these are often displayed in a table, known as a contingency table, for example (C) 


{a) examination grades for Mathematics in three farther education colleges 


College J 
linked i re the result of the match and the 
Bradley Cooper Dunstan ed in a3 by 2 contingency table. ‘ype of weather and they have been 
A 27 35 17 The hypotheses are: 
q . 
2 B 52 36 28 Ag weather has no effect on the team’s results 
4 2 © is a ot u the weather has an effect on the team’s results 
q 3 D 31 43 21 When calculating the expected frequenci 
B&F 16 17 p remain the same. quencies, the row and column totals must 
N 5 12 8 
This is a 6 by 3 contingency table (6 rows and 3 columns). 


Consider the cell linking a win with good weather: 
Weather 
Total number of Ne ma et 
wins = 16, therefore [ 
= eae Win | Ff 162 |e sow 
P(result is a win) = 50° z he 
Total number of matches m4 Lose a 
in good weather = 24, Total | 24° <0 
therefore : : 
= j d otal 
P(good weather) = 50° column total rol 


According to the nu hypothesis. the events ‘the result is a win’ and ‘the weather 
ig the n yp’ h 7 

@? are inde endent, so, using the multiplication rule for independent events 
is good a p' Ip: 


(see page 198) 
P (win and good weather) = P(win) x P(good weather) 


16 . 24 
ey ;_, 16 24 
Expected number of wins in good weather = 50! x 56 x 
16 x 24 
“~30 


gives a clue to the quick way of working out 


— 16«2 
Note that the calculation 


the expected frequency: 
row total x column total 
Expected frequency = Saul 


So, for example, the expec ted number of draws in bad weather is calculated as 


ollows: 
2. 
Weather Expected frequency 
in Be roe row total x column total 
7 = 
a Win grand total 
4 Draw: Eo 13 F 13x26 
a Lose om 0 
Total 26 50 =6. 
The completed table for the expected frequencies is: 


Weather 
Good Bad Total 


Win 7.68 8.32 16 
Draw. |. 6.24 6.76 13 
Lose © | 10.08 10.92 24 


Total 24 26 20 


Result 


Note that all the exp 
to be combined. 


ected frequencies are greater than five, so cells do not need 
| 


3. Work out y. 


Degrees of freedom, v 


Notice that in this table once two of the expected frequencies in different rows 


have been calculated (for example those in bold type), the others are known 


automatically. This is because the row and column totals must agree with those 
in the observed data, for example 


if expected number of wins in good weather = 7.68, 

then expected number of wins in bad weather = 16 — 7.68 = 8.32 
Number of degrees of freedom, v = 2 and the x7(2) 
Test at the 1% level. 

From tables y7,,(2) = 9.21, so reject Hy if X?> 9.21. 


distribution is considered, 


2: 
6 FE (O-B} X= Sees 6.96 (2d.p.) 
E E 
12 7.68 2.43 
5 6.24 0.246 ... | 
7 10.08 0.941... 
4 8.32 2.243 :.; 
8 6.76 0.227 ..,, 
14 10.92 0.868... 


a EE 
x0 = 50 ZE=50 6.956... 


you Since X? < 9.21 do not reject Hy, 
M independent of the weather. 


At the 1% 


and conclude that the team’s results are 


level conclude that the weather has no effect on the team’s result. 


Finding the number of degrees of freedom, v, inanh by k contingency 
table 


There is a general rule for calculat 
shown below, it is possible to wor 
with a x have been found. 


4 by 3 table 


ing v for data ina contingency table. In each of the tables 
k out all the expected frequencies once the values indicated 


v=(4—1)x (3-1) 
=3x2 
=6 


2 by 4 table 


y=(2-1)x(4-D 
x x x =1x3 
=3 


3 by 2 table 


v=3-1)xQ-1) 
=2x1 
=2 


2 by 2 table 


y=(2-1)x(2-1) 
=1x1 


es =1 


In general, if there are ) rows, then once (h — 1) expected frequencies in a row have been 
calculated, the last value in the row is known because the row total must agree. 
Similarly, if there are & columns, once (k — 1) expected frequencies in a column have been 
calculated, the last value in the column is known because the column total must agree. 


For an b by k contingency table, 


number of degrees of freedom = ( ~ 1) x (k ~ 41). 


Yates’ correction for a 2 by 2 contingency table 


In particular, for a 2 by 2 contingency table, v = 1 and the 771) distribution is considered. In 


this case, Yates’ correction should be applied when calculating X*, where 
O~E|-0,5? 
i 


{ 


2 
oy 


Example 12.9 


A driving school examined the results of 100 candidates who took their test for the first time. 
It was found that out of the 40 men, 28 passed and out of the 60 women, 34 passed. Do these 
results indicate, at the 5% significance level, a relationship between the sex of candidate and 


the ability to pass the driving test at the first attempt? 


Solution 12.9 


CE TEST 587 


Displaying the results in a contingency table: 


I. State Hy and 


3. Work out py. 


Result of driving test 
Pass Fail Total 
+ 
Male 28 12 40 
Female 34 26 60 
Totals 62 38 100 


The hypotheses are: 


Hy: There is no relationship between the sex of a candidate and he abi Ity to 
0 


H,: There is a relationship. 
To calculate expected frequencies, use 


row total x column total 


Expected frequency = 
grand total 


So expected number of males who pass = 20% = 24, 
1 


Use the fact that row and column t 


otals agree with the observ 
out all the remaining frequencies: ohn 


Result of driving test 


Pass Fail 
Male | 248 152 eee 
Female 37.2 22.8 
Totals | fe 38 p= 37 
62 - 24,8 = 37.2 


Note that there are no expected frequencies that are less than 5, 
Degrees of freedom, v 

v=(2—1)(2-1) =1, so use the 27(1) distribution. 

Test at the 5% level. 

From tables ¥”,.,(1) = 3.841, so reject Hy if X2 > 3.841, 


wa 


Using Yates’ correction, 


(jO-E|- 0.5 
(lO=H-0.5) op a 
2 B SESS 
=1.29 (2 dp.) 
28 24.8 0.293... 
34 37.2 0.195... 
12, 15.2 0.479... 
26 22.8 0.319... 
£O=100 ZE=100 1.289... 
3.841 


Exercise 12c Contingency tables 


1. Two schools enter their pupils for a particular 
public examination and the results obtained are 


shown below. 


Credit Pass Fail 


School A St 10 is 
School B 39 10 4 


ing an approximate y? statistic, assess at 
one ie eet or not there is a significant 
difference between the two schools with respect 
to the proportions of pupils in the three amen 
State your null and alternative hypotheses. (I) 


. Students in the Sociology department of a ‘ 
university decided to conduct a survey into the 
roles of married couples in performing tasks of 
housework and child care. They designed 
questionnaire for this purpose. tae 
They then contacted 240 married couples w’ 7 
were willing to take part in the survey. Each o 4 
the participating couples was randomly allocate 
to one of two groups. In the first group the wife 
was asked to complete the questionnaire and in 
the second group the husband was asked to 

lete it. 
Pour response categories were available for a 
question which asked how the work of cleaning 


the house was shared between husband and wife. 


The following table shows the numbers of 
husbands and wives choosing each category. 


Since X? < 3.841, do not spec 
The driving test results do not indicate ar : 
candidate oa the ability to pass the driving test at the first attempt. 


elationship between the sex of a 


Carry out a x? test to investigate whether there is 
an association between the sex of the respondent 
and the respondent’s view of how the work is 
shared. i 
Comment on any differences revealed by this 
survey between the opinions of husbands and 


wives about who does the household arecre 


3. The following are data on 150 chickens, divided 


into two groups according to breed, and into 
three groups according to yield of eggs: 


Yield 


High Medium Low. 


Rhode Island Red 46 29 e 
Leghorn 27 14 


Resporise category Husbands:: Wives 


Wife does it all 21 30 
Wife does most of it 63 58 
Shared half and half 28 25 


Husband does all or most of it 8 7 


i i hesis 
Are these data consistent with the hypot! 
that the yield is not affected by the type of breed? 


4. A research worker studying the ages of adults 
and the number of credit cards they possess 
obtained the results shown in the table. 


Number of cards possessed 


<3 >3 
u.. <30 74 20 
2330 50 35 


Use the x? statistic and a significance test at the 
5% level to decide whether or not there is an 
association between age and number of credit 


cards possessed. 


(L) 


5. An investigation into colourblindness and the sex 
of a person gave the following results: 


Colourblindness 
Colourblind Not colourblind 
Male 36 964 
A> Female 19 984 


Is there evidence, at the 5% level, of an 
association between the sex of a person and 
whether or not they are colourblind? 


6. Ina small survey 350 car owners from four 


districts P, Q, R, $ were found to have cars in 
price ranges A, B, C, D, the frequencies of the 
prices being as shown in the table, 


P Q R Ss 
eA 9 10. 12 19 
‘s} 
w  OB 13 20 18 29 
gees 24 29. 12 25 
AED, 34 41 18 37 


Find the expected frequencies on the hypothesis 
that there is no association between the district 
and the price of the car, Use the y? distribution 
to test this hypothesis. (AEB) 


7. Arandom sample of 100 shoppers was asked by 


a market research team whether or not they used 
Sudsey Soap. 58 said yes and 42 said no. Ina 
second random sample of 80 shoppers, 62 said 
yes and 18 said no. By considering a suitable 

2x 2 contingency table, test whether these two 
samples are consistent with each other. (O & C) 


. The table summarises the incidence of cerebral 
tumours in 141 neurosurgical patients, 


Type of tumour 
Benign : Malignant Others 


Site of 


§ Frontal lobes. 23 9 6 
£ Temporal lobes|. 21 4 3 
= Elsewhere 34 24 17. 


THE y 
x 


10. In an investigation into eye colour and left- or 
right-handedness the following results were 


obtained: 
Handedness 
Left Right 
Eye Blue 15. 85, 
colour: Brown: 20. 80 


Is there evidence, at the 5% level, of an 
association between eye colour and left- or right- 
handedness? 


11. In 1988 the number of new cases of insulin- 


dependent diabetes in children under the age of 
15 years was 1495, The table below breaks 
down this figure according to age and sex. 


Age (yrs) 0-4 5-9 10-14 Total 


Boys 205 248 328 784 
Girls 182 251 281 714 
Total 387 499 609 1495 


Perform a suitable test, at the 5% significance 
level, to determine whether age and sex are 
independent factors. (C) 


12. When analysing the results of a 3 x 2 


contingency table it was found that 


& (O,-E)? 
yn 38. 


Write down the number of degrees of freedom 
and the critical value appropriate to these data in 
order to carry out a x? test of significance at the 
5% level. (L) 


i 


13. Ina college, three different groups of students sit 


the same examination, The results of the. 
examination are classified as Credit, Pass or Fail, 
In order to test whether or not there is a 
difference between the groups with respect to the 
proportion of students in the three grades the 
statistic 


Find the expected frequencies on the hypothesis 
that there is no association between the type and 
site of a tumour. Use the ? distribution to test 
this hypothesis. (AEB) 


. In an examination 37 out of 47 boys passed and 


27 out of 41 girls passed. By considering a 
suitable 2 x 2 contingency table, test whether 
boys and girls differ in their ability in this 
subject. 


(O-E)? 


Le 
is evaluated and found to be equal to 10.28. 


(a) Explain why there are four degrees of 
freedom in this situation, 

(b) Using a 5% level of significance, carry out 
the test and state your conclusions. (L} 


14. The personnel manager of a large firm is 
investigating whether there is any association 
between the length of service of the employees 
and the type of training they receive from the 
firm. A random sample of 200 employee records 
is taken from the last few years and is classified 
according to these criteria. Length of service is 
classified as short (meaning less than 1 year), 
medium (1-3 years) and long (more than 3 
years). Type of training is classified as being 
merely an initial ‘induction course’, proper initial 
on-the-job training but little if any more, and 
regular and continuous training. The data are as 


follows: 


Length of service 
Short Medium Long 


‘gs bo Induction course 14 23 13 
a8 Tnitial on-the-job] 12 7 13 
ES Continuous 2 82 88 


Examine at the 5% level of significance whether 
these data provide evidence of association 
between length of service and type of training, 
stating clearly your null and alternative 
hypotheses. 


Discuss your conclusions. (MEI) 


Summary 
af significance test 


© The test statistic is X? where 

O-E) 

where -X?=)) ce 
When v = 1, use Yates’ correction, 
((O-E|—0.5)? 

E : 


Remember to combine classes if E< 5. 


where Xe > 


@ Degrees of freedom, v 
4° goodness of fit tests 


4S. 


16. 


and X?~¥?(v). 


‘A market research organisation interviewed a 
random sample of 120 users of launderettes in 
London and found that 37 preferred brand X 
washing powder, 66 preferred brand Y and the 
remainder preferred brand Z. A similar survey 
was carried out in Birmingham. In this survey, 
of 80 people interviewed, 19 preferred brand X, 
40 preferred brand Y and the remainder 
preferred brand Z. Test whether these results 
provide significant evidence, at the 5% level, of 
different preferences in the two cities. (C) 


The results obtained by 200 students in 
chemistry and biology are shown in the table. 
Test, at the 5% level, whether the performances 
in both subjects are related. 


Chemistry. 
Pass Fail 
8 Pass 102 45 
3 pail 24 32 


v= number of classes — number of restrictions (see table on page 579) 


42 tests for independence 


For an & by k contingency table, v = (bh — 1)(k~1)- 


Miscellaneous worked examples 


Example 12.10 


In experiments in pea breeding Gregor Mendel obtained 


oe the following data relating to 556 
Round:and. Wrinkled and. 
Round and i 
Yellow Yellow Green Loe “ 
en 
315 101 108 32 


According to Mendel’s theoretical results 
Calculate the value of y? for these data oA 


the expected figures are in the ratios 9:3:3:1 
the assumption that the theory is correct. 


Test at the 10% significance level whether the theory is contradicted. 


ae sia suggested that Mendel’s results are s 
ained from random observati 

ons. Co 
x’ calculated. sa 


uspect in that they are unlikely to have been 
nt on this suggestion in relation to the value of 


(C) 
Solution 12.10 | | 
Hg: The different ty i E 
i pes of peas occur in the ratio 9:3;3: a 
H;: The different types of peas do not occur in this rate,’ . | = 
Expected frequencies, according to Hy: | 
Round and yellow ox ' 
556 = 
Wrinkled and : isis | 
and yellow ig X 556 = 104.25 | 
Round and green ie X 956 = 104.25 | 
Wrinkled and green ig X 556 = 34.75 
Round Wrinkled Round and Wrinkled 
and yellow. and yellow green and green 
Observed (O) 315 101 108 
Expected (E) 312.75 104.25 104.25 34 75 alee 
. é LE = 556 


pe four classes and one restriction (XE = 55: 6) 

eretore v = 4 — 1 = 3 and the y7(3) distribution is considered. 
Perform the test at the 10% level. . 
From tables 77 9.,(3) = 6.251, so reject Hy if X?>6.251. 


O E (OE)? 

E 
315 312.75. 0.0161... 
101 104.25 0.101. 3. 
108 104.25 0.134. 3.. 
32 34.75, 0.217... 
ZO=556 LE=556 0.470... 


(O-E)? 
xy Se = 0.47 (2 d.p.) 


10% 


6.251 if 


Since X? < 6.25, accept Hy and conclude that the types are in the ratio 9:3:3:1. 


The calculated value of X? is very small indeed, suggesting very little discrepancy between the 
observed and expected frequencies. 


From 7’ tables, P(X? < 0.352) = 5% so on only just over 5% of occasions would you 


expect to have a test value this low. This could suggest that the data are not random 


observations. 


Example 12.11 
Mt and Mrs Smith live in a small town with two primary schools 
decide which school would provide the better learning environment for their children. They 
have available the results of recent national tests in mathematics, English and science. Each 
child in the final year took three tests, one in each subject, and they either passed or failed 
each test. These results are summarised in the table below. 


Aand B, They are trying to 


3 passes 1 or 2 passes No passes 
School A 15 6 5 
School B 10 14 13 


(a) Stating your hypotheses clearly test, at the 5% level of significance, whether or not there is 
evidence of an association between school and test results. 


Mr and Mrs Smith also have available the results of a questionnaire about the annual family 
income x, in thousands of pounds, of the families of the children taking these tests. The results 


are summarised in the table below. 


x > 30 20<x <30 15<x<20 x<15 
School A 7 5 9 5 
School Bo 6 13 8 10 


A x? test for association between school and family income using this information gave a test 


statistic of 3.545. There was no pooling of classes. 
(b) Using a 5% level of significance, interpret this statistic stating the critical value used. 


(c) In the light of parts (a) and (b) state, giving reasons, which of the two schools Mr and Mrs 
(L) 


Smith might choose for their children. 


Solution 12.11 


a) Hy: The two factors ‘school’ and ‘results’ are independent. 
. ‘P' 
between school and 


H,: The factors are not independent and there is an association 
results. 
Observed data: 
3: passes 1 or-2 passes No: passes Totals 
School A 15 6 § 26 
School B 10 14 43 37 
ae 

Totals 25 20 18 63. 


Expected data: 
For school A and three Passes 


expected frequency = row total x column total 


grand total 
_ 26x25 
63 
= 10.32 (2 d.p.) 
The complete table is as follows: 
3 passes 1 or 2 passes No’ passes Totals 
School A 10.32 8.25. 7.43 
School B 14.68 11.75 10.57 > 
E 3 
Totals 25 20 18 T 
63 


The table has 2 rows and 3 columns 
sov=(2-1)(3-1)=1x2=2 and the y2 i 

= e ¥°(2) distribution i i 
ee ribution is considered, 
From tables, 7? 5»,(2) = 5.991, so reject Hy if X* > 5,991 


5 é (O-B? 
= 2 (O-E)* 
eyo 

15 10,32 2.122... 2 = Re 
6 8.25 0.613... 
5 7.43 0.794... 
10 14.68 L494. 
14 11.75 0.430... 
13 10,57 0.558... 
r0=63 LE=63 6.012..:. 

5.991 


Since X? > 5,991, reject Hy and concl 
the school and the test results. 


(b) The table has 2 rows and 4 columns 
=(2~1)(4-1)= 
sov=(2~1)(4-1)=1x3=3 and the 4°(3) distribution is considered. 


Fe ‘ . : 
a Hd two factors school and ‘family income’ are independent. 
1 Lhere is an association between school and family income , 


From tables y?0,(3) = 7.815, so reject Hy if X? > 7.815. 
It is given that X? = 3.545 


ao x? < 7.815, do not reject Hy. 
ere 1s no association between school and family income 


ude that there is evidence of att association between 


(c) As there is no Ociati a. e. ran som. are inkely 
association between school and family j d Mrs Smith 
3 . mily incom 4 i i 
to base their choice on the results of the national ee Since a of the pupil i 
US in 


school A obtained three DASSES, aS CO: y > 
: ° : 
e : compared with only 27% with hree i 00: 
he d h y 7 t passes in school B 
Mr and oe thm gh t conclude that school A provides the better learning 
eects eee 


Miscellaneous exercise 12d 


1. It is suggested that preferences for ces 
proposed routes for a town bypa OF eaten 
See ole a oo pe pleschocee en te 

2 . 
sacral tS con a surrounding or 
was asked which route he or she ara e 
results are given in the following table. 


‘Town Surrounding villages 
2 
Route t 50 = 
Route 2 28 : 
Route 3 16 


i d alternative 
e appropriate null an 
oe and use a x” test to ae a the 5 oa 
ignifi estion that there is 
ignificance level, the sugg 
sueeieoe between preferred route and where 


(c 
people live. 


2. (a) A random sample of supermarkets = aia 
. a questionnaire on which they iat br e 
report the number of cases of see Hino 
they had dealt with in each ae °: - 
previous year. The totals for each mon 
were as follows. 


foRMAMg J A S OND 


161210 176.18 16°17 1022.14.16 


Carry out a chi-squared re at ps 

i ignificance to 
appropriate level of significance to 
seemnise whether or not shoplifting emo 
likely to occur in some aad ores a 

to be o 

You may take all months a 
ans Make clear your null and alternative 
hypotheses, the level of significance you are 
using, and your conclusion. 


You may, if you wish, use the fact ee i 
when all the values of fe are equal, the bn 
chi-squared test statistic may be written a! 


1 ypr_sy, 
Ne 


(b) Prove the result given at the end of part en 


3. It is thought that there is an arate sree 
, ’s eyes and the rea 

the colour of a person’s ey« d 
the person’s skin to siete light. annerT a 
i i i f a random sample 
investigate this each o 
peop! Wik subjected to a standard dose of a 
ultraviolet light. The degree of Le wER 
noted, ‘~ indicating no reaction, ‘+’ indicating 
slight reaction and ‘++ indicating one ‘ 
reaction. The results are shown in the table 
below. 


Eye colour 
Grey or 
Blue green Brown 

5 = 7 8 18 
8 

& 4 | 29 10 is 
& Bad 21 9 
Perform an appropriate test at the 5% 


significance level, stating your null and 
alternative hypotheses. 


Describe briefly how the number of degrees of 


freedom is‘calculated in a x? oopnese nt at test. 

The following set of grouped data from 

observations has mean 1.03. sor ee di 

‘ma 

thought to come from a nor ue 
riance 1 but unknown mean. a 

enabors y?-distribution, test this hypothesis 

at the 1% significance level. 


=— 
[Lower value of Number of 
grouping interval Observations 
=o 0 
=2.0 1 
=1.5 0 
=1.0. 6 
~0.5 10 
0.0 12 
0.5 15 
1.0 23 
1.5 16 
2.0 13 
2:5. 3 
3.0 1 
3.5 0 
(C} 
5. A farmers’ cooperative decided to test canard 
‘ brands of fertiliser, A, B and c peeaans ot 
at random to 75 plots. The yield a ie oie fat 
classified as high, medium or low. The 


summarised in the table below. 


iliser 
: — ise: . Toul 
30 
High 2 MS 3 4 
fa] E 8 8 
“g : Medium 8 3 a4 
7 Low 5 2 
7S 
Total 


(a) Stating your hypotheses clearly, test at the 
5% level of significance whether or not 
there is any evidence of an association 
between brand of fertiliser and yield. 


Fertilisers A and B are produced by Quickgrow 
whereas C is produced by Bumpercrops. The 
farmers wanted to decide from which company 
to purchase fertiliser and combined the figures 
for A and B to give a 3 x2 table. The statistic 


(O-EY 
aa 


for this new table was calculated and gave the 
value 7.622, 


(b) By carrying out a suitable test at the 5% 
level of significance, advise the farmers 
whether or not there is any evidence of an 
association between the choice of company 
and yield. 

{c) Giving your reason, advise the farmers 
which company they should use. (L) 


» 


A statistician, who is suspected to be suffering 
from asthmna, is asked to record his peak flow 
measurement four times each day for a period of 
four weeks. 

He groups by value the 112 recorded 
measurements into seven classes giving observed 
frequencies, 0, i= 1,2, ..., 7. He then calculates 
correctly corresponding expected frequencies, lp 
using a normal distribution having mean and 
variance estimated from the original 
measurements, 


The value of the test statistic 
7 2 
(0;—e)) 


it 


is then calculated correctly by the statistician as 
5.624, 


(a) Using a 1% level of significance and stating 
the null hypothesis, complete the test. 

(b) Give the usual requirement made on each of 
the values e; prior to calculating the test 
Statistic, and indicate how a failure to meet 
the requirement may be overcome. (NEAB) 


- Arandom sample of 100 people was asked for 


their opinions about the amount of sport shown 
on TV. Each person had to say whether there 
Was too much sport shown, about the right 
amount, or not enough. The numbers of men 
and women making each response are shown in 


the table. 
Men Women 
Too much sport 13 26 
About right 22 22 


Not enough: sport 12 5 


The null hypothesis is that a person’s opinion 
about the amount of sport shown on TV is 
independent of the person’s sex. 


(a) Construct a table showing the expected 
frequencies, assuming that the null 
hypothesis is true. 

(b) Use a x? test to test this null hypothesis, 
using a 5% significance level. Show full 
details of your method and state your 
conclusion clearly, (C} 


» The Director of Studies at a College of Further 
Education believed that there was a connection 
between candidates’ grades in mathematics and 


physics at A-level. For a set of candidates who 
had taken both examinations, she recorded the 
number of candidates 
as shown in the table, 


in each of four categories, ' 


Mathematics Mathematics 

grades A-C grades D-U } 
Physics grades A-C 22 9 : 
Physics grades D-U 8 15, ! 


(a) Test the Director’s belief at the 2.5% level 
of significance, stating your null and 
alternative hypotheses. 


Her colleague said that she was losing accuracy : 
by combining the grades A to Cin one group, : 
and grades D to U in another. He suggested that j 
she should create a 7 x7 table showing all i 
possible combinations of grades, 


(b) State why his suggestion might lead to a 
problem in performing the test. (L) 


During a working day a machine requires 
occasional adjustments which appear to be 
randomly distributed throughout the day. A 
factory foreman records the number of 
adjustments made to the machine each day for a 
period of 200 working days, obtaining the data 
displayed in the table. 


Number of adjustments’. 0.4 2. 


Seales 


Number of days 


34°78 61.20.52 


Previous experience has suggested that the daily 
number of adjustments to this machine follows a 
Poisson distribution with mean 1.5, 


(a) Perform a x? goodness of fit test to decide 
whether the data in the table can reasonably 
be considered as conforming to a Poisson 
distribution with mean 1.5. 

(b) Outline, without detailed calculation, the 
necessary modifications to your test if the 
Poisson mean is not assumed to be 1:5: 

(c) The distribution B(S, 0.3) is a very good fit 
to the data in the table. Without further 

calculation, explain why, despite this good 
fit, the binomial model is not appropriate. 


(NEAB) 


10. A department store has five doorways, each for 


entrance and exit, It is claimed that the 
proportion of shoppers entering or leaving the 
store is the same for each of the five doorways. 


Tl 
st 


he number of customers entering or leaving the 
ore is counted at each doorway for three 


12. A factory operates four production lines. 
Maintenance recor 
of stop 


ds show that the daily number 
pages due to mechanical failure were as 


shown in the table below (it is possible for a 
production line to break down more than once 
on the same day). You may assume that 

Ef= 1400, Lfx = 1036. 


Test whether or not these data support the cl 
‘The same store also records the daily number of 


e 


it. 


aim. 


ales charged to stolen credit cards. The results 
‘or the first four months of 1990 are as follows. 


Number of sales Number of days 


0 31 
J 39 
2 19 
3 11 
24 ‘ihe, 


[ee 
Explain why a Poisson distribution may be 
appropriate as a model for the daily number of 
sales charged to stolen credit cards. Test the 
hypothesis that the daily number of sales does 
follow a Poisson distribution. (NEAB) 


In the mathematics department of a college, 
candidates in an examination are graded A, B, C, 
D or E. Records from previous years show that 
examiners have awarded a grade A to 15% of 
candidates, B to 20%, C to 35%, D to 25% and 
Eto 5%. A new syllabus is examined by a new 
board of examiners who award the grades to 200 
candidates as follows: 
A,33; B37; C,81; D,36; E13 
(a) Stating clearly your hypotheses and using a 
5% level of significance investigate whether 
or not the new board of examiners awards 
grades in the same proportions as the 
previous one, 
In addition to being classified by examination 
grade, these 200 students are classified as male 
or female and the results summarised in a 
contingency table. Assuming all expected values 
are 5 or more, the statistic 


29, (O,- E)? 


= 


isd i 

{b) Stating your hypotheses and using a 1% 
significance level, investigate whether or not 
sex and grade are associated. (L) 


was 14.27. 


13. 


randomly selected days with the following results. 
Doorway Number of customers Numberof 0 4 BZ 4 56 or 
A 601 stoppages, % more 
B 673 Number of | .728 447.138. 48 26 13° 0 
C 626 days, f 
D 618 
E 702 (a) Use a x? distribution and a 1% significance 
level to determine whether the Poisson 
distribution is an adequate model for the 


data. 
(b) The maintenance engineer claims that 
breakdowns occur at random and that the 
mean rate has remained constant 
throughout the period. State, giving a 
reason, whether your answer to (a) is 
consistent with this claim. 
Of the 1036 breakdowns which occurred 
230 were on production line A, 303 on B, 
270 on C and 233 on D. Test at the 5% 
significance level whether these data are 
consistent with breakdowns occurring at an 
equal rate on each production line. (AEB) 


(c 


A group of students studying A-level statistics 
was set a paper, to be attempted under 
examination conditions, containing four 
questions requiring the use of the x” distribution. 
The following table shows the type of question 
and the number of students who obtained good 
(14 or more out of 20) and bad (fewer than 14 


out of 20) marks. 


‘Type of question 
Contingency Binomial Normal * Poisson 

table fit fit fit 
Good. 25. 12 12 i 
mark 
Bad 4 11 3 12 
mark 

hether the 


{a) Test at the 5% significance level w! 
mark obtained {by the students who 
attempted the question) is associated with 
the type of question. 

(b) Under some circumstances jt is necessary tO 
combine classes in order to carry outa test. 
If it had been necessary to combine the 
binomial fit question with anot 
which question would you have ¢ 
with and why? 


her question: 
combined it 


(c) Given that a total of 30 st 
ude! 

Paper, test, at the 5% sipuiicance peg 
wi ether the number of students ane i 
a particular question is associated wi i a 
ope of question. bisa 
pa the difficulty and popularity of th 

erent types of question in the light of “ 
your answers to (a) and (c). (AEB) 


(d) 


14. (a) The number of books borrowed from a 


1s, 


library durin, i 
ig a certain week 

sats 431 on Tuesday, 485 on a 

edne 
Paine lay, 443 on Thursday and 523 on 
Is there any evidence tha 
t the numb 

books borrowed varies between een 
et the week? Use a 1% level of 
significance. Interpret fi 7 
conclusions, pera s 

{b) sues eA rate of turnover of 
employees by a personnel manager prod 
the following table showing theleted, pi 
stay of 200 people who left the company f 
other employment. i 


Grade 


Length of employment (yeais) 
0-2 2-5 >5 


Managerial 
Skilled 
Unskilled 


Using a 1% level of signifi 
( l ignificance, analyse thi 
information and state fully the coneleioes ae 


your analysis, (AEB) 
shia) a long period of time, a research team 
x aera the number of car accidents which 
ai = : ee county, Each accident 
classified as being trivial (mir 
das b nor damage and 
no personal injuries), seriou: chicle 
's (damage to vehicl. 
and passengers, but a eal idanince 
5 no deaths) or fatal (da 
. ., hiss 
LP and loss of life). The colour i the ee 
ba a “ the opinion of the research team, 
eae rine ier also recorded, together 
lay of the week on which the accii 
, accident 
occurred. The following data were collected, 


Colour Trivial Seriotis Fatal 
White 50 25 16 
Black 35 39. 18 
Green 28 23 13 
Red 25 17 11 
Yellow 17 20 16 
Blue 24 33 10 


A “ 
nalyse these data for evidence of association 
betw. een the colour of the car and the type of 


State the condition which sometimes necessit: 
the amalgamation of rows or columns in si 
contingency tables, Explain why amalgamati 
might not be appropriate for this table. ™ 


The following table summarises the data relati 
. hore of the week on which the cone 
Number of 

Day accidents 

Monday. 60 

Tuesday 54 

Wednesday 48 

Thursday 53 

Friday. 53 

Saturday. 75. 

Sunday 77 


lace the hypothesis that these data are a 
m sample from a uniform distribution, 


(AEB) 


16. (a) The number of accidents per day ona 
pie of motorway was recorded for 100 
ays and the following results obtained, 


Number of accidents Frequency. | 
0 44 
1 32 
2 9 
3 10 
4 Ry 
Sor more 0 
" Examine whether or not a Poisson model is 


suitable to represent the number of acid 
per day on this stretch of road. Use 1% 
level of significance. : vas 
(b} The results of a survey to establish th 
attitude of individuals to a particular: 
political proposal showed that 
three-quarters of those interviewed were 
house owners, Of the 44 interviewed, 
only 6 of the 35 in favour of the pro] > 1 
were not house owners. pee 
Does the survey indicate that a person’s 
phe meiiirt ee a is independent of 

9 
Ped arteh ip? Use a 1% level of 


(AEB) 


Mixed test 12A 


ived per day 
ber of telephone calls receive 
e tee aad of 150 days is shown in the table 


below. 


0 1 2 ? 
{0 


Number of calls 
Number of days 50 54 36 


i number of calls per day. 
i Lipa ahr for a Poisson model 
to be appropriate in this case? fein 
) Carry out a x7 goodness of fit analy: a Te 
‘ test the null hypothesis that the number 
telephone calls received per day es Gs 
Poisson distribution with meen - bode 
5% significance level. Give full detai ie 
your method. 


i hat 
iversity Sociology Department believes tl 
a ane a good grade in A tere een 
Studies tend to do well on sociology ge aoe 
ses, To check this it collected in: oe ea) 
eeontiodt sample of 100 eae iu , ps y 
graduated and had also = en sie yee 
A level. The students’ performa Sarsen 
i s divided into two categories, 
eet A or B, and ‘others’. Their ae 
classes were recorded as Class i, ae oe 
IU, Fail. The data is given in the table : 


Class of Degree: 
Class 1 Class 1 Class IT Fail’ ‘Total 


General: Grade 


1 40, 

Studies A’or B iL 2 s 4 60 

Grade: Others 4 28 0 5.100. 
‘Total 15 50 3 


Use this data to test, at the 1% signiticenes ate 
the hypothesis that degree class is independent 


3. The heights (x) of 100 police officers recruited to 


General Studies A level performance. State ar 
conclusion clearly. 


ised 
a police force in a particular year a SEP 
i i The mean and s 
the following table. : 
deviation of the original data are 180 cm an: 


3 cm respectively. 


Height (cm) Frequency 
x< 175 2 
175 <x<177 15 
177 <x <179 29 
179 <x < 181 25 
181 <x< 183 12 
183 <x <185 10 
185 <x z 


i | distribution to the 
it an appropriate norma 1 
bi fu. and test the goodness of fit at the 
5% level. 


(C) 


A survey in a college was commissioned to 


investigate whether or not there was any ety 
association between gender and ae a a 
test, A group of 50 male and 50 Spr gee 
was asked whether they passed nf tes mx re 
ivi irst attempt. 
riving test at the first a 
oietiad taken the test. The results were as 


follows. 


Male 
Female 


%o 
Stating your hypotheses clearly hs ins ae 
is any ev’ fat 
whether or not there is a’ d 
ee between gender and passing a es 
test at the first attempt. 


Mixed test 12B 


1. Itis claimed that when homing pigeons are 
disorientated harmlessly they will exhibit no 
particular preference for any direction of flight 
after take-off. To test this, 128 pigeons, from 
lofts in a particular region, were disorientated 
harmlessly and then all released from a position 
100 miles south of the region. The direction of 


flight of each pigeon was recorded with the 
following results, 


Flight 

direction °=90°90°=1802 180°=270° 270°=360°. 
Number. 

of pigeons 30 35. 36 27. 


Use the x? goodness of fit test to determine 
whether or not these data can be used to 
discredit the claim. (NEAB) 


2. An increasing number of people are spending 
their working hours in front of a visual display 
unit (VDU). Sixty-five workers using non- 
adjustable screens and 66 workers using 
adjustable screens were asked if they experienced 
annoying reflections from the screens, The 
resulting responses are given in the table below. 


Annoying reflection 
No 


bie Non-adjustable 
8 & Adjustable 


Test the claim that there is no association 


between screen type and a worker’s experience of 
annoying reflections. (NEAB) 


3. A six-sided die is believed to be biased in the 
following way: 


the probabilities of throwing a one, a two, a 
three or a four are equal; 

the probability of throwing a five is twice the 
probability of throwing a one; 

the probability of throwing a six is three times 
the probability of throwing a one. 


The die is thrown 150 times, and the results are 
recorded in the table below. 


Score 1 2 3 4 3 6 


Frequency. 18-15-4929 39539 


Test, at the 5% significance level, the belief that 
the die is biased in the way described. (C} 


A student of botany believed that multifolium 
uniflorum plants grow in random positions in 
gtassy meadowland, He recorded the number of 
plants in one square metre of grassy meadow, 
and repeated the procedure to obtain the 148 
results in the table. 


Number 
of plants: 0 Po ge ging 


3.6 Z-or greater. 
Frequency 9° 24 43.34.94 15.2 


0 


(a) Show that, to two decimal places, the mean 
number of plants in one square metre is 
2.59, 

{b) Give a reason why the Poisson distribution 


might be an appropriate model for these 
data, 


Using the Poisson model with mean 2.5 9, 
expected frequencies corresponding to the given 


frequencies were calculated, to two decimal 
places, and are shown in the table below. 


Number of plants Expected frequencies 
0 11:10 
ean 28.76 
2 S$ 
3 32.15 
4 20.82 
S 10.78 
6 4.65 
7 or greater. t 
(c) Find the values of s and ¢ to two decimal 
places, 
(d) Stating clearly your hypotheses, test at the 
5% level of significance, whether or not this 


Poisson model is supported by these data. 
(L) 


Significance tests for correlation coefficients 


In this chapter you will learn about 


® a sigr ificance test for r, the product moment correlation coefficier t 
@ a significance test for r Speartr an's coefficient of rank corre jatio 
gn ‘3! | 


Background knowledge 

ili ith the ideas i c 
a cin product-moment correlation coefficient (pa 
an 


Spearman's coefficient of rank correlation (page 146). 


j 2 page 119) 
associated with correlation (see Chapter a uA it 


SIGNIFICANCE TESTS FOR CORRELATION COEFFICIENTS 


i roduct- 

tackling this section you need to review the work covered in fe. 2 on the p 
en Selig coefficient and Spearman's rank correlation coefficient. = 
ie lation coefficient has been calculated it is usual to make an as Cs 
bese Sent You might say, for example, that there is good positive oes 
picid iscentiape + that ie is weak negative correlation. There isa 7 sue : 
pee oe decide whether there is a correlation between the variables, backed by 
that allows 


statistical theory rather than just a suspicion. 


Tr 
TEST FOR THE PRODUCT-MOMENT CORRELATION COEFFICIEN 


lation 
In Chapter 2 (page 139) you learnt how to calculate r, the product-moment corre 
n Chapte 
coefficient between two sets of data X and Y. 


Using small s format: 


s 1 Ey Dey 
r=—. where Saya BY RY 


S585, 


Using big S format: 


2 E 
ed 5.5, where: $= y= at 
See VS. =f Ex? (x) 


Remember that r is such that -1 <r <1, where 
r=-1 indicates perfect negative correlation 
r=0 indicates no correlation 
r=1 indicates perfect positive correlation, 


If r is very close to zero, then you would probably say that the two variables X and Y are not 
related at all. If r is very close to 1, for example 7 = 0,992, then you would probably say that 
there is a strong positive linear correlation between X and Y. But what about a value for 7 of 
0.694? Would you be able to claim that this indicates positive correlation? What about a 
value of -0.5? Does this indicate negative correlation between the variables? A significance 
test is needed! 

In order to carry out a significance test, 
with correlation coefficient p, referred t 
collected so that they constitute a rand. 


assume that X and Y are jointly normally distributed 
© as the population correlation coefficient. Data must be 
‘om sample from the whole population values of X and Y. 


The null hypothesis, Hy 


The null hypothesis is always that the correlation coefficient is zero, 


ie. there is no correlation 
between the variables. This is written Ay: p=0. 


The alternative hypothesis, H, 


The alternative hypothesis depends on whether the test is one-tailed or two-tailed. 
One-tailed tests 

If you think there is a positive correlation between the variable 
hypothesis is H,: p > 0 (there is a positive correlation between the variables). 

If you think there is a negative correlation between the variables X and Y, the alternative 
hypothesis is H,: p <0 (there is a negative correlation between the variables) 
Two-tailed tests 

If you are looking for a correlation but not specifying whether it is 
the alternative hypothesis is p + 0 (there is some correlation betwee 
The calculated value of 7, the product-mom 


critical value which is found from tables. A: 
on page 652. 


s X and Y, the alternative 


positive or negative, then 
n the variables). 

ent correlation coefficient, is compared with the 
n extract is given below and the tables are printed 


602 AE 


Critical values for product-moment correlation coefficient 


Level Sample 

0.10 0.05 0.025 0.01 0.005 size 
0.8000 0.9000 0.9500 0.9800 0.9900 4 
0.6870 0.8054 0.8783 0.9343 0.9587 5: 
(i) 0.6084 0.7293. 0.8114 0.8822 0.9172 6 
0.5509 0.6694 0.7545 0.8329 0.8745 7 
(ii) 0.5067 0.6215 0.7067 0.7887 0.8343 8 
04716 0.5822 0.6664 0.7498 0.7977 9 

(iti) 0.4428 0.5494 0.6319 0.7155 0.7645 10 


The tables are easy to use. The highlighted values are referred to in the following illustrations: 
(i) Consider hypotheses 


Hy: p = 0 (there is no correlation between the variables) 
H,: p > 0 (there is a positive correlation between the variables). 


This is a one-tailed (upper tail) test. At the 5% level, the critical value is found under 
column 0.05. If r has been calculated from, say, six pairs of data, ie. sample size 6, the 
critical value is 0.7293. 


This means that in random samples from a distribution in which p = 0, only 5% of these 
samples will give a value of r greater than 0.7293. So, at the 5% level of significance, you 
would reject Hp (that there is no correlation) in favour of H, (that there is positive 
correlation) if r > 0.7293 


at 0 


(ii) The same tables are used when testing for a negative correlation. Consider hypotheses 


Hp: p = 0 (there is no correlation between the variables) 
H,: 9 <0 (there is a negative correlation between the variables). 


This test is one-tailed (lower tail). At the 1% level, look up the value in the column 
headed 0.01. For a sample size of eight pairs of data, the value given in the table is 
0.7887, indicating that the critical value is -0.7887. At the 1% level, you would reject Hy 
if r < 0.7887. . 


(iii) Now consider hypotheses 
Hp: p = 0 (there is no correlation between the variables) 
H,: p # 0 (there is some correlation between the variables). 


This test is two-tailed. At the 5% level of significance, you want critical values that give 
2.5% in each tail, so look under the column headed 0.025. For a sample size of 10, the 

critical value given in the table is 0.6319. This means that you would reject Hy in favour 
of H, if 7 > 0.6319 or r<—0.6319 ie. if |r| > 0.6319. 


Example 13.1 


The scatter diagram illustrating ten pairs of values (x, y) is shown below. 


(a) Comment on the diagram. 


(b) Calculate the value of r, 


the product-momen i ici 
x aN re ahs t correlation coefficient for the pairs of data 
c) A i joi 
(c) aries: that X and Y are jointly normally distributed with correlation c 
€ data constitutes a random sample, test, at the 5% 
correlation between X and Y, 


coefficient p, and 
level, whether there is a positive 


(d) Would your conclusion be the same at the 1% level? 


Solution 13.1 


(a) From the scatter diagram, there a 


Pppears to be some positive li i i 
pprelesarinarcrniey positive linear correlation but it does 


{b) In the diagram, the data points are 


x 5 8 12 15 15 17 20. 20 25: 27 


y 3 11 9 6 15 13 25 15 iB 20 
Using the calculator in LR mode, 


it can be shown that r = 0.6954 (4d.p.) 
(See page 140 if you need to review how to calculate r, with or without a calculator.) 
(c) The significance test is carried out as follows: 


Hb: p = 0 (there is no correlation between X and Y) 

Hy: p > 0 (there is positive correlation between X and Y) 
Perform a one-tailed (upper tail) test at the 5% level. 
The sample size is 10. 


From tables, the critical value is 0.5494, so reject Hy if r > 0.5494 
From the calculations in (b), r= 0.6954, 


Since r> 0.5494, Hi, is rejected in favour of Hy. 
There is evidence of positive correlation between X and Y. 
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) For a test at the 1% level, the critical value is 0.7155 so Hy is rejected if r > 0.7155. 


Since r = 0.6954 < 0.7155, do not reject Hp. 


At the 1% level, there is not enough evidence to say that there is positive correlation 


between X and Y. 


. In each of the following significance tests for the 
product-moment correlation coefficient the 
calculated value of 7 is as shown. Use tables of 
critical values to decide whether Hp is rejected or 
not. 


Level-of 
n r Hypotheses significance 
7. 0.893... Hy p= 0, Hyp #0 2% 
140. 0.499.. Ho: p= 0; Hy: p. > 0 1% 


28° 0.324. Hy p=0,Hp#0. 10% 
280.324 Hy p=0, Hyp >0 1% 
16°°'-0.419 °° Hy p=0, Hire <0 5% 
12) 40.689" Hyp = 0, Hy: p #0. 10% 
120.689. Hyp = 0; Hip >0 1% 
10)°0.733) Hyp =0, Ay: p> 0 1% 


2. Asmall bus company provides a service for a 
small town and some neighbouring villages. In a 
study of their service a random sample of 
20 journeys was taken and the distances x, in 
kilometres, and journey times t, in minutes, were 
recorded, The average distance was 4.535 km 
and the average journey time was 15.15 minutes. 


{a) Using Ex? = 493.77, Lt? = 4897, 
Ext = 1433.8, calculate the product-moment 
correlation coefficient for these data. 

(b) Stating your hypotheses clearly, test, at the 
5% level, whether or not there is evidence of 
a positive correlation between journey time 
and distance. 

(c) State any assumptions that have to be made 
to justify the test in (b). (L) 

3. In order to investigate the strength of the 

correlation between the value of a house and the 

yalue of the householder’s car, a random sample 

of householders was questioned. The resulting 

data are shown in the table, the units being 

thousands of pounds. 

In=762 x’=68088 Ly=64.5 


Ly? = 606.63 Ixy = 6067.4 


(a) Represent the data graphically. 

(b) Calculate the product-moment correlation 
coefficient. 

(c) Carry out a hypothesis test, at a suitable 
level of significance, to determine whether 
or not it is reasonable to suppose that the 
value of a house is positively correlated with 
the value of the householder’s car. 


se 13a Significance test for product-moment ¢ 


alation coeffic 


Value of house, x Value of car; y 
110 12 
106 9.5. 

$1 2.4 
94 4,2 
66 4.1 
26 0.3 
72 3.2 
51 6.0 
53 78 
133 15 


(d) A student argues that when two variables 
are correlated one must be the cause of the 
other. Briefly discuss this view with regard 
to the data in this question. (MEI) 


4, For the sets of data given, test the hypotheses 


indicated. Then draw a scatter diagram and 
comment on whether this reinforces your 


conclusion. 
{o is the population product-moment correlation 


coefficient.] 


(a) 


D3 DD TB BS Ee AB Be 27 


Ho: p = 0, Hy: p < 0; 5% significance level 
(b) 


*. y 
SA 5.3 
54 10.2 
5.5 15.7 
10 DS 
10.2 10.9 
10.4 15.1 
15 5.3 
15.4 10.9 
15.6 15.3 
30. 25.1 
20.2, 20 


Ho: p= 0, Hy: p > 0; 1% significance jevel 
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SPEARMAN’S COEFFICIENT OF RANK CORRELATION, r 


earman’s coefficient of rank correlation f the d A 
S: fficie! i is te € ranks of the data. As you saw 
Pp is calculated using the ranks y 
‘ 


on page 146. i if di i 
page 146, for » data points, if dis the difference between the ranks for a data point, th 
me , then 


nn" — 1) 


Remember that -1 i we 
-I<r,< 1, with r,=1 indicati 

emember tha Rin ty = 1 indicating perfect i 
pines Ae hr, i g p agreement between the 

a tae ng that the rankings are in exact reverse order (complete disagr asad 

s =U indicating no correlation between the rankings r eae 
Writin, i i 

g P, for the population rank correlation coefficient, the null hypothesis is alw. 
Hie ; ays 

Ho: p, = 0 (there is no correlation between the rankings) 


The alternative hypothesis is either 


Hy:p,>0 is positi 
1: P,> 0 and there is positive correlati 
; on (agreement) betwee! i 
(one-tailed (upper tail) test) oe ere 
or Ay: i i 
13 P, <0 and there is negative correlati i 
; tion (disagreement i 
(one-tailed (lower tail) test) 7 Sera gaat 


or é i i 
A: p, #0 and there is correlation between the rankings (two 


Note that the test for Spearman’s coefficient of 
assumptions about the population parameters. 


-tailed test). 
rank correlation does not make any 
It is known as a non-parametric test. 


€ critical values for Spearma a correiation co 1€] € 10 om tab. Ww! e€ 
Th tical val fe ip n’s rank relation efficient are f und from ta les hich ar 


very similar in format to those f 
or the product-moment c i ich 
orre. i 
shown below and the tables are printed on page 652. Se ret 


Critical i 
values for Spearman’s rank correlation coefficient 


Sample Level 

size 0.05 0.025 0.01 
4 1,0000 es = 
5 0.9000 1,0000 1.0000 
6 0.8286 0.8857 0.9429 
7 0:7143 0.7857 0.8929 
8 0.6429 0.7381 0.8333 
9 0.6000 0.7000 0.7833 

10 0.5636 0.6485 0.7455 

MW 0.5364 0.6182 0.7091 


For a one-tailed test at the 5% 


level, i 
value 0.7143 and means that evel, sample size 7, look under column 0.05. This gives the 


@ for H,:p,>0, Hy is rejected if r, > 0.7143 


y t 
a ‘ 


e for H,: p,<0, Hp is rejected if r,< —0.7143. 


9 
For a two-tailed test at the 5% level, sample size 9, look under column 0.025 (half of 5%). 
ey 2 


+ 
~1  -0.7143 


This gives the value 0.7000 and means that 


@ for H,: p,+#0, Hp is rejected if r, > 0.7000 or r, < -0.7000, ie. if |r, | > 0.7000. 
1) Ps #9, 


Example 13.2 


to arrange 
A teacher selects one boy and one girl at random from her class, and asks them rg 


t 
~0.7000 


SS 
critical region 


It: 
11 types of food in order of preference. The food types are labelled A to K and the results are 


given below. 


Boy’s order: E 


K 


F Cc 


Girl’s order: F 


i icient for these data. 
(a) Calculate Spearman’s rank correlation coefficie 


ignifi hether or not there is 
(b) Stating your hypotheses clearly test, at the 1% level of significance, whe! 


K E ¢ 


evidence of a positive correlation. 


i (L) 
{c) Interpret your conclusion to the test in part (b). 
Solution 13.2 
a) 
Hood fo ER 
Food type A B ¢€ D E G - : a : 
Boy’s order, 2 8 5 4 7 ; - : 
11 7 
Girl’s order, y. 9 5 4 8 3 ; : : 
a = 1 0 0, 1 =); =2 4 
0 
a 1 0 0 1 4 4 16 0. 0 
620° 
Ld? = 30 andu=11,so7r,=1 yer 
6x30 
147x420 


= 0.8636 ... 


(b) The significance test is carried out as follows: 


Ay and H,. 


Perform a one-tailed ( 


The sample size is 11, 


r,> 0.7091, 


From tables (page 652), 


From part (a), r,= 0.8636 ... 


Since r, > 0.7091, Hy is rejected in favour of Hy. 


Hy: p, = 0 (there is no correlation) 


Hi: p,> 0 (there is positive correlation between the boy’s and girl’s 
preferences) 


upper tail) test at the 1% level, 


the critical value is 0.7091, so reject Ay, if 


There is evidence of positive correlation between the boy’s and girl’s 


preferences, 


(c) The boy and girl agree in their preferences, 


Exercise 13b Significance 
correlation 


1. In each of the following significance tests for 
Spearman’s rank correlation coefficient, the 
value of © d? obtained when calculating r, is as 
shown. Use tables of critical values to decide 
whether H, is rejected or not, 


neo Yd? Hypotheses significance 
fa) [9 212 Hyp, =0, Hyp, <0 1% 
(b) | 8 30. Hyp. =0, yp.30 5% 
© | 8 30 Hyp,=0, Hyp 40 5% 
() [10.78 Hyp. =0, Hy:p.s0 5% 
(e) 10.252 Ay: p,= 0, Hyp, <0 5% 
() | 10 274 Hiyp,=0, Hisp.40 5% 
(g) vA 18 Ay: p,= 0, Hyp, #0 10% 
(h) 7106 Hy: p.=0, Hip, <0 1% 


Level of 


2 test for Spearman's coefficient of rank 


Find, to three decimal places, the Spearman rank 
correlation coefficient between the order of 
manufacture and the order given by the expert, 


Refer to tables of critical values to comment on 
the significance of your result, State clearly the 
uull hypothesis which is being tested, (L) 


3. Applicants for a job with a company are 
interviewed by two of the personnel staff. After 
the interviews each applicant is awarded a mark 
by each of the interviewers, The marks are given 
below. 


7 14 Hp: p= 0, Hy: p, + 0 5%. 


Candidate 


AccB OG es De B op GH 


Interviewer. 1°22: 27° 24 17920552216: 43 
Interviewer 2.28 23.25 14°26. 17.20 45 


2. An expert on Porcelain is asked to place 7 china 
bowls in date order of manufacture assigning the 
tank 1 to the oldest bowl. The actual dates of 
manufacture and the order given by the expert 


are shown, 


Bowl Date of manufacture Order given by'expert 


1920 
1857. 
1710 
1896 
1810 
1690 
1780 


Qi Ee OS oy 
Wem NE Go 


(a) Calculate, to two decimal places, the 
Spearman rank correlation coefficient 
between these two sets of marks. 

(b) stating your hypotheses and using a 5% 
level of significance, interpret your result. 

(L) 
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4. Ten architects each produced a design for a new (a) Calculate, to two decimal places, a rank 
building and two judges, A and B, independently correlation coefficient for the performances 
awarded marks, x and y respectively, to the 10 of the ski-jumpers in the two jumps. 
designs, as given in the table below. {b) Using a 5% level of significance and quoting 

- from tables of critical values, interpret your 
Design Judge A (x) Judge B (y) result, State clearly your null and alternative 
5 iG 
1 50 46 hypotheses (L) 
2 35 26 6. The positions in a league of 8 hockey clubs at the 
3 55 48 end of a season are shown in the table. Shown 
4 60 44 also are the average attendances {in hundreds) at 
5 85 62 home matches during that season. 
6 a5 48 Calculate a coefficient of rank correlation 
6 30 between position in the league and average home 
@ R attendance. 
8 90 60 
9 45 34 Club Position Average attendance 
40: 40 42 
A 1 30 
Calculate Spearman’s rank correlation coefficient B 2 32 
for the data and test, at the 5% level, the C 3 Dp 
hypothesis that there is no correlation between 
the marks awarded by the two judges. {C) D 4 1 
E 5 27 

5. Ina ski-jumping contest each competitor made Fp 6 18 
two jumps. The orders of merit for the 10 7 
competitors who completed both jumps are G 7 
shown in the table. H 8 25 

| Ski jumper. First jump Second jump Refer to the appropriate table of critical values to 
: : comment on the significance of your result, 
A 2 4 stating clearly the null hypothesis being tested. 
B 9 10 {L) 
C 7 by 
D 4 it 
E 10 8 
FE 8 9 
G 6 2 
He. 5 7 
vf 1 3 
i 3 6 
Summary 


« Significance test for the product-moment correlation coefficient, 7 
‘The assumptions are that X and Y are jointly normally distributed and the sample must 


constitute a random sample from the whole populations of X and Y. 
4. State Hy:/p = 0 (there is no correlation between Xand Y) 
State H, as follows 
Hi, p> 0 (there is positive. Hyp <0 (there is negative _H,: p #0 (there is correlation 
correlation: between X: correlation between between-X and: Y) 
and Y) Xand-¥) 
2. State the level and type of test, e.g. one-tailed test at the 5% level: 


ION 


State tl jécti iterj B 
Rej pies criterion, obtaining the critical value front tabl 
eject A i < ables. 
Reject H, if : 
r > critical value ; : Reject Hy if 


r <~critical val i 
Cal i ay titel 
culate ¢ and compare with the critical value: [che sees vale 


5... Make your conclusion. 


4, 


Signifi 
gniticance test for Spearman’s rank correlation coefficient, r. 
ps that this is a non-parametric test = 
+> State Hy: p, = 0 (there i ‘oa 
2 = ere is no. correlation between th 
State H, as follows en ees 
A: p,> 0 (there is agreement 


Ay: 9; < 0 (there i 
between the ranks of ae 


cay cere between the ranks of 
etween the ranks X and Y) 
2. State the level caine 
ie eve and type of test, e.g. one-tailed test at the 5% level 
3. State the rejection criterion, obtaining the critical value from bie fs 
Na a : Reject Hy if jest Hy if 
‘ soe 
ical value r,<—critical value |r, |> critical value 


Calculate r, and compare with the ¢titical value. 
5. Make your conclusion. 


Miscellaneous worked example 
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Ai: p, #0 (there is correlation 


Example 13.3 


During the lambing season 8 ewes and t 


widetivie ae e lambs they bore were weighed at the time of birth 
Ewe 
A B Cc D E. E G H 
Weight of ewe, x kg 440041 B 40 aE 
38 3. 
Weight of lamb, y kg 3.5 2.8 3.2 2:2: 2.9 2.8, : 
Fe i; ‘| i 2.8 2.6 


You may assume Ex = 319, Xy=23.0, ~x?=12 785 Ly’ = 66.88 


Cale i 
ulate the product-moment correlation coefficient between X and Y. 


Making any nece: io whethe! e data could have come from 
y ‘Ssary assumptions, tes hether the d: 

: ; : ya t oul: x 

with correlation coefficient p=0. Use a 5% level of significance. 


Zxy = 923.2. 


a population 
{AEB) 


610 A CONC 


Solution 13.3 
Using small s formula to calculate r: 
iny 923.2 319 23.0 
= -ky= - =0:759°... 
Say * xy 8 8 x 8 
z 12785 /319\? 
poe ge eS = 8.1093 ... 
aS ay, 8 8 
? é 23.0\ 
Rie Oe ope POR EE. (280) a ours. 
~  4n 8 8 
s, OF7 59% 
pa Se = 0.868 (3 s.£) 
SySy V8.1093 ... x VO.0943 ... 
Using big S formula to calculate r: 
Ix 319 x 23.0 
5. = Sxy - = 93,9 2B 6.075 
ny Py 8 
(=x) 
Soy, = Dx? = = 12 785 - = 64.875 


Sy 6.075 
8Y ee = 0,868 (3 5.6.) 
SS, 64.875 x VO.755 


The product moment correlation coefficient between the weight of a ewe and the weight of its 
lamb is 0.868. 


r= 


Assume that X and Y are jointly normally distributed with product-moment correlation 
coefficient p and the data form a random sample from the populations of X and Y. The 
significance test is carried out as follows: 

Hg: p = 0 (there is no correlation between the weight of a ewe and its lamb) 


1, State Ay and A, 
H,: p + 0 (there is correlation between the weight of a ewe and its lamb) 


2, State level and 
Perform a two-tailed test at the 5% level. 

The sample size is 8. From tables, the critical value for a two-tailed test at 
the 5% level is 0.7067 (page 652, row n = 8, column 0.025). 


type of test. 


tate the rejection 
criterion, 

Hg is rejected if | r | > 0.7067. 

For the data, r = 0.868. 

Since | r | > 0.7067, Hp is rejected in favour of H,. There is evidence of 
correlation between the weight of a ewe and its lamb. 


4, Calculate r. 


§, Make conclusion. 


It is unlikely that the data came from a population with correlation 
coefficient p = 0. 


Note that the conclusion would have been the same if you had chosen to carry out a one 
tailed test. In this case H, is p > 0, the critical value of r is 0.6215 and Hy is rejected since 


r> 0.6215. 


Example 13.4 


The coursework 


: grades, A highest > eet 
givenibelow. > Es to G lowest, and examination marks of 8 candidates are 


Coursework Examination 


Grade Mark 


92. 
75 
63 
34 
48 
45 
34 
18 


RQ Our we tay 


(a) Calculate the value of a 


n appropriate r i 
ae pprop: measure of correlation between these two sets of 


(b) Test whether this value indicates evidence of corr 


sie elation betw: wi 
exa ee een cours 
mination grades at a 5% significance level, ework grades and 


(c) Give a practical interpretation of your value 


Solution 13.4 


a) Calculati i 
(a) lating Spearman’s rank correlation coefficient, r, 


Coursework 


Examination mark 
48 


Coursework rank 1 35 5 45 34 18 
Examination mark rank { 3 5 , 7 3.5 8 6 

d 3 6 
lat 0 1.5 2 2 2 2.5 : ; 
0 225 4 4 4 695 to 4 


(AQA) 


Yd? =25.5, n= 8, therefore re=1- pee 
n(n, - 1) 

ae 6x 25.5 

8x 63 


=0.696 (3 s.f.) 
(b) Hy: Pp; = 0 (there is no correlation) 
Hi: p, #0 (there is evidence of correlation) 
spare: a two-tailed test at 5% level, 
orn tables (page 651), critical value is 0.7381 (x =8, column 0 025) 
Reject Hy if |r, |> 0.7381. j . 
Since 7, = 0.696 < 0.7381, do not reject Hy. 
(c) PB " RS 
bee ne in the examination does not re 


nese 


There is no evidence of correlation, 


flect on performance in coursework, 


Miscellaneous exercise 13c 


1. To test the belief that milder winters are 


followed by warmer summers, seeteqnOlaerey| 
records are obtained for a random sample o 
years. For each year the mean temperatures are 


found for January and July. The data, in degrees 
Celsius, are given below. 
Jan: July. 
8.3 16.2 
TA 13.1 
9.0 16.7 
1.8 11.2 
3.5 14.9 
4.7. 45.1 
5.8 17.7 
6.0. 17.3 
2:7 12:3 
2.4; 13.4 
(a) Rank the data and calculate Spearman’s 
rank correlation coefficient. 
(b) Test, at the 2.5% level of significance, the 
belief that milder winters are followed by 


warmer summers, State clearly the nul! and 
alternative hypotheses under test. 

(c) Would it be more appropriate, less 
appropriate or equally appropriate “ se 
the product-moment correlation coefficient 


to analyse these data? Briefly explain wien 


. Bird abundance may be assessed in several ways. 
In one long-term study in a nature reserve, two 
independent surveys (A and B) are carried out. 
The data show the number of wren territories 
recorded (survey A) and the numbers of adult 
wrens trapped in a fine mesh net (survey B) over 
a number of years. 


Survey A Survey B 
ie ae 
19 412 
27 15 
50. 18 
60 22 
70 35 
79. 28 3S. 

79. 7A 
184 46 
85 53 
97: 52. 


(a) Plot a scatter diagram to compare results for 


the two surveys. = ' 
{b) Calculate Spearman’s coefficient of rank 


correlation. 


(c) Performa significance test, at the 5% level, 
to determine whether there is any 
association between the results of the two 
surveys. Explain what your conclusion 
means in practical terms. _ 

(d) Would it be more appropriate, less 
appropriate or equally appropriate ies : 
the product moment correlation coefficient 


to analyse these data? Briefly explain Wee 


_ The data below shows the height above sea level, 


x metres, and the temperature, y °C, at 
7.00 a.m., on the same day in summer at 
nine places in Europe. 


Height, x ‘Temperature, y 
1400 6 
400 15 
280 18 
790. 10 
390 16 
590 14 
540. 13 
1250 7 
680 13 


(a) Plot these data on a scatter diagram. pe 
(b) Calculate the product-moment correlation 
coefficient between x and y, 
(Use Ex? = 5 639200, Zy*= 1524, 
Ixy = 66 450) ; de 
(c) Give an interpretation of your coefficient. 
On the same day the number of hours of 
sunshine was recorded and Spearman’s rank 
correlation between hours of sunshine and 
temperature, based on Xd? = 28 was 0.767. 
(d) Stating clearly your hypotheses and using a 
5% two-tailed test, interpret this rank 
correfation coefficient. {L) 


4. At the end of a season a league of eight ice 


i ble 
hockey clubs produced the following ta 
ahowine the position of each club in the feague 
and the average attendances (in hundreds) at 
home matches. 


Club Ae Be Ce De OE) Bos Go A 


Positign’ Fes 2a ses Sees TB 
Average 37. 38. 19 27 34.26 > 22. 32: 


(b) Stating clearly your hypotheses and using a 
5% two-tailed test, interpret your rank 
correlation coefficient. 

Many sets of data include tied ranks. 

(c) Explain briefly how tied ranks can be dealt 


with. (L) 


5. At an agricultural show ten Shetland sheep were 


ranked by a qualified judge and by a trainee 
judge. Their rankings are shown in the table. 


Qualified judge. Trainee judge 
1 1 
2 2 
3 5 
4 6 
5 7: 
6 8 
7: 10 
8 4 
9 3 

10. 9 


Calculate a rank correlation coefficient for these 
data. 

Using one of the tables provided and a 5% 
significance level, state your conclusions as to 
whether there is some degree of agreement 
between the two sets of ranks, {L) 


. A teacher recorded the following data which 


refer to the marks gained by 13 children in an 
aptitude test and a statistics examination. 


Aptitude Statistics 

Child Test, x Examination, y 

A 54 84 

B 52 68 

¢ 42 71 

D 31 37 

E 43. 79 

EB 23 58 

G 32 33 

H 49. 60 

£ 37. 47 

J 13 60 

K 13 44 

£ 36 64 

M 39 49 


(a) Calculate the Spearman tank correlation 
coefficient between position in the league 
and average home attendance. 


(a) Draw a scatter diagram to represent these 
two sets of marks. 

(b) Calculate, to three decimal places, the 
product-moment correlation coefficient 


between the test mark and the examination 
mark. 


(You may use Ix? =18 672, Ly? = 46 626, 
Exy = 28 234) 

(c) Comment on your result. 

(d) The teacher decided that, on the basis of the 
scatter diagram, children F, J and K 
performed differently from the rest of the 
group. Suggest why the teacher might have 
come to that decision. 

(ce) The teacher decides to analyse the data 
ignoring these three children. Calculate, to 
three decimal places, the Spearman rank 
correlation coefficient between the other ten 
pairs of observations. 

(f) Using a 5% level of significance and quoting 
from tables of critical values interpret the 
rank correlation coefficient. Use a one-tailed 
test. 

State clearly the null and alternative 
hypotheses. {L) 


The yield (per hectare) of a crop, ¢, is believed to 
depend on the May rainfall, m. For 9 regions 
records are are kept of the average values of c 
and m, and these are recorded below. 


¢ m. 

8.3 14.7 
10.1 10.4 
15.2 18.8 
6.4 13.1 
11.8 14.9 
12.2. 13.8 
13.4 16.8 
11.9 11.8 
9.9 12.2, 


(Ee=99.2, Im=126.5, Bc? =1150.16, 
Em? = 1832.07, Eme = 1427.15) 


(a) Find the equation of the appropriate 
regression line, 

(b) Find r, the linear (product~moment) 
correlation coefficient between ¢ and m. 

(c) Ina tenth region the average May rainfall 
was 14.6. Estimate the average yield of the 
crop for that region, giving your answer 
correct to one decimal place. 

(d) Calculate the value of r,, Spearman’s rank 
correlation coefficient, for the above data 
and determine whether it is significantly 
greater than zero at the 5% level. 

{e) State, with a reason, which of ¢ and r, you 


regard as being more appropriate for these 
data. (C) 


. Ina random sample of 8 areas, residents were 


asked to express their approval or disapproval of 
the services provided by the local authority. A 
score of 0 represented complete dissatisfaction, 
and 10 represented complete satisfaction, The 


table below shows the mean score for each local 
authority together with the authority’s level of 
community charge. 


Community: ‘Approval 
‘Authority: Charge (£) rating 
A 485. 3.0 
B 490 44 
( 378 5.0 
D 451 4.6 
E 384 41 
F 352 5.5: 
G 420 5.8 
H 212 6.1 


Calculate Spearman’s rank correlation coefficient 


for these data. 

Carry out a significance test at the 5% level 
using the value of the correlation coefficient 
which you have calculated. State carefully the 
null and alternative hypotheses under test and 


the conclusion to be drawn. (MEI) 


A local historian was studying the number of 


births in a town and found the following figures 


relating to the years 1925 to 1934. 


Male births, x Female births, y 
223 219 
218 205 
223 209 
223 239 
242 252 
278 256 
299 254 
256 257. 
255 259. 
292: 323, 


(a) Draw a scatter diagram to illustrate this 
information. 


The historian calculated the following summary 


statistics from the data: 


coefficient. 
The historian believed these data gave strong 


evidence of a positive correlation between male 


and female births. 


(c) Stating your hypotheses clearly, test at the 


1% level of significance whether or not 


Syy = 8276.9, Sy = 10 230.1, S,y = 7206.3. 


'b) Calculate the product-moment correlation 


there is evidence to support the historian’s 


belief. 


(d) State an assumption required for the validity 


of the test in part (c) and comment on 
whether or not you consider it to be met. 


10. 


11. 


In 1924 there were 249 male and 177 female 
births. 


(e) Without carrying out any further 
calculations state, giving a reason, what 
effect the inclusion of these figures would 

ave on the value of the product moment 

correlation coefficient. (L) 


Seven rock samples taken from a particular 
locality were analysed. The percentages, Cand 
M, of two oxides contained in each sample were 
recorded, The results are shown in the table. 


Sample c M 
4 0.60. 1,06 
2 0.42, 0.72 
3 0.51 0.94 
4 0.56 1.04 
Ss 0.31 0.84 
6 1.04 1.16 
7 0.80 1.24 
Given that 


DCM = 4.459, EC? =2.9278, EM?=7.196, 


find, to three decimal places, the product— 
moment correlation coefficient of the percentages 
of the two oxides, Calculate also, to three 
decimal places, a rank corretation coefficient. 
Using tables state any conclusions which you 
draw from the value of your rank correlation 
coefficient. State clearly the null hypothesis being 
tested. (L) 


In the table below, x is the average weekly 
household income in £ and y the infant mortality 


per 1000 live births in 11 regions of the UK in 
1985. 
Region x ¥y. 

A 170.4 8.4 
B 183.2 9.4 
13; 172.9 10.3 
D. 187.4 10.5 
E 203.2 8.3 
F 204.8 94 
G 208.8 8.5 
HL 248.0 9.0: 
E 198:3 9.4 
} 187.1 9.8 
K 179.1 96 


It is hypothesised that a hi 
1 s a high value of x wi 
nara me a low value of y. Seeing 
ank correlation coeffici i 
Hai a tee ent and test its 


It appears that region A is exceptional. What 


would your findings be if thi i 
t gs be if this region wer: 
omitted from the analysis? = : 


Mixed test 13A Correlation Coefficients 


1. The bivariate sample illustrated 
a random sample of 20 students. 


zy 
(a) 
(b) 


{c) 


Mass (y kg) 
a 
o 


50 


You are given that Lx = 3358, <x? = 567 190, 


=1225, Ly?=76 357, Evy =206 680. 


peor vnc product-moment correlation 
Carry out a hypothesis test, at the 5% level 
of significance, to determine whether or not 
there is evidence that the height of a student 
is positively correlated with his or her mass. 
What feature of the scatter diagram su; ests 
that this test is appropriate? ae 
A Statistics student suggests that a positive 
correlation between height and mass implies 
that ‘the taller a student is the heavier he or 
she will be’. Comment on this statement 
with reference to your conclusions in 


part (b), (MEI) 


in the scatter diagram below shows the heights, x cm, 
. > 


170 : 180 190 
Height (x cm) 


a ce indges give marks for artistic impression 
out of a maximum 6.0) to 10 ice skaters. 


200 
Skater co Judge 1 Judge 2 
A 5.3 54 
B 4.9 $6 
c 56 58 
p 5.2 5.6 
E 5.7 $2 
f 48 45 
c 5.2 47 
H 46 48 
i oA 53 
J 49 49 


(a) Calculate the value of Spearman’s rank 


correlation coefficient f 
3 ‘or the ma 
two judges, ne 


b 
(b) Use your answer to (a) to test, at the 5% 


and masses, y kg, of 


abe of significance, whether it appears that 
there is some overall agreement between the 


judges. State your hypotheses 
conclusions carefully. ad 


(c) For these marks the pred eee a 
correlation coefficient is 0.6705. Use - 
test, at the 5% level, whether thete . al =A i 
correlation between assessments of the 


dges. : 
(d) Caenee on which is the more aaa ES 
test to use in this situation. 


i i is a positive 
. Itis hypothesised that there is a | ; 
sy eltion between the population ofa pekeraes 
and its area. The following table gives a random 
sample of 13 countries with their area x, ff : 
thousands of square kilometres, and population 
y, in millions. 


Country x ¥. 
1 2.5. 0.5 
2 28 5 
3 30. 2 
4 72. 4 
3 98 42 
6 121 24 
7 128 16 
8 176 3 
2 239 14 
10 313. 37 
it 407 6 
12 435, 17 
43, 538 22 
[SE 


Plot a scatter diagram ae comment on its 
implicati sis. 

implication for the hypothes' 7 

Calculate a suitable correlation coefficient sre 
test its significance at the 5% level. 


4, A psychologist was studying the Pare 
between short term menert sa rt he eee 
i ample o 
oe ae oe seconds and the students 
were asked to recal} as many of the objets as 
they could. The number of objects reca i a 
correctly was recorded and compared wi 


mark (percentage) in a recent hea aire 
examination. The results are given below. 


H 
Student. Aw Be C.D. E FOG 


No. of 
objects Cee es eet eee eee 


Maths % 56 64 75 69 48 63 52°: 84 


(a) Calculate Spearman’s rank correlation 
coefficient for these data. 


(b) Using a $% level of significance, and stating 


your hypotheses clearly, interpret your 
result. ; 

(c) Give a reason why it may be eon 
appropriate to use Spearman’s rank 


correlation coefficient for the hypothesis test 


than the product-moment correlation 


coefficient. ({L) 
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Contents 

Using a Spreadsheet 617 
Using Internet Resources and Word 620 
Using Autograph 622 
Ch.1 Representation and summary of data 625 
Discrete/continuous, frequency density 627 

Ch.2 Correlation and regression 630 
Ch.3 Probability 632 
Ch.4, 5 Discrete random variables 635 
Ch.6, 7, 8 The normal distribution 637 
Ch.9 Sampling and estimation 640 
Ch.10, 11 Hypothesis tests 642 

INTRODUCTION 


The intention of this supplement is to explore the use of ICT in the teaching and learning of 
statistics and probability. It is well established now that dynamic and interactive computer 
images can bring subjects to life in a way that was impossible to imagine before. The principle 
benefit is that lessons can now have variety. The same topic can be presented in the traditional 
way on the board, explored in a practical simulation, investigated using a spreadsheet or 


illustrated using a graph plotter. There is also quite likely to be a useful JAVA applet or some 
interesting real data on it from the internet. 


Furthermore students can now present their findings electronically, and teachers can store, 
share and continually refine their lesson plans. And if you add to all this the obvious benefit of 
the computer’s ability to carry out calculations without effort, ICT 


methods are almost 
guaranteed to enhance the enjoyment of those teaching and studying this subject. 


USING A SPREADSHEET 


A spreadsheet, such as Excel, has enormous power; it can efforth 
conduct simple simulations. Getting familiar with all this can come only with practice, and 
there are plenty of spreadsheet tutorials around, both in print and on the net. For starters, 
here is a summary of some of the more important f 


eatures that relate to statistics, and in each 
of the following pages, features that relate to specific topics are listed. 


essly analyse huge data sets, 


618 AC 


Entering formulae and functions 


i ‘=’ button 
Enter ‘=’, or click on the yeaa 
Use th ; nctions menu that is now inserted to the ba . aioe ieee saree 
‘ela it in i it, then click on the ‘=’, ‘ 
r t in if you know it, t ; eta 
aay ate Maas La waist each parameter is, and a useful HELP link. De 
elpful entry box, 


appropriate formulae are given under the chapter headings. 


Array 


This word is used for a continuous selection of cells ina spreadsheet. Arrays are referred to by 
1S y: 
the top- eft and bottom-right coordinates, €.8. A2:B16 


The edit menu 


it i i ful to tr 
To insert a data set from some other source it is sometimes use y 
a Gpeye” 
it = ial = select “text a 
Edit Paste Special ‘Siege 
ies, e.g. 1, 2,3 1000: First enter the start value in a cell igh 
To generate a series, e.g. 1, 2, 3, ... 


Edit = Fill= Series 


The format menu 


e of the numbers: ‘ 
ee Cells = Number tab = Number = Decimal places 
‘ormal 


To control the appearance of a selected array: 


Format column = autofit 
Format = autoformat (fancy styles) 


To control format conditionally: 
Format = conditional formatting 
e.g. if= 0 then put a frame round it. 


The tools menu 


Tools = solver: this finds the value of a cell that makes the target cell a max, a min or =4 
value, e.g. to solve x*3 2 = 0, set Al = 1 and B1 = A1*3 _ 2. Select B1 then Tools => Solver, 
B1 is the ‘target cell’, “Equal to Value of” (0), by changing cell A1. Solve! 


To show formulae instead of results: 
Tools = options > View > Window options 
TICK ‘show formulas’ (or use Ctrl-*) 


To hide the grid lines: 
Tools = options => View = Window options 
UNTICK ‘show gridlines’ 


To ensure auto-recalculation on pressing F9; 
Tools = options = Calculations => Automatic 


For iterative solving, e.g. x = &(x): 
Tools = options 3 Iteration 
= Max iterations (100) = Max change 0.001 
e.g. Enter Al = 1 
Select A1 
Enter = COS(A1) 


Press 


For a cell formula referring to itself without iteration: 
Tools = options Iteration 

= Max iterations ( 1) (leave Max change) 
(e.g.D1=D1+41 to increment by 1) 


To draw a histogram (applicable only to equal classes): 
Tools = Data Analysis => Histogram 
(see Chapter 1) 


For random number generation (see Chapter 9): 
Tools = Data Analysis 
=> Random Number generation 
Continuous Uniform (a, b), normal (1, s), Bernouilli (p), binomial (1, p), Poisson(z), 
Discrete, in 2 columns: x, P(X =x) 


For sampling 1 items from an array: 
Tools => Data Analysis > Sampling 


For t-tests: 
Tools => Data Analysis 
= t-test (paired 2 samples for means) 
2-sample assuming equal variances 
2-sample assuming unequal variances 


| 


For the forms toolbar: 


Tools = Customize = Forms ; 
Label, Group-box, check-box, option button, list-box, combo-box, scroll-bar, spinner i 


The data menu 


Click first on the top left hand corner of the data. 


Sorting and filtering: 

Data => sort (up to 3 columns deep) 

Data = filter => auto-filter 

This is useful if a dataset has pasted incorrectly into a single column. Converting text into 
columns; 

Data => Text to Columns 


For re-interpreting multiple data sets: 
Data = PivotTable report 


Charts 


Excel was never written as an educational tool, so it is worth getting to know the charts 
option to sort out what is going to be useful and what is purely decorative. Of the chart 
options available ‘Column’, ‘Bar’, ‘Line’, and ‘XY (Scatter)’ all work well, but Excel is not 


good at Histograms, or Time Series Moving Averages. 


USING INTERNET RESOURCES AND WORD 


There is an ever-growing amount of useful information on the net for the study of statistics 
and probability. Mixed in with all that is the less useful, and the task of sifting out quality 
resources is getting harder. Listed here are some authenticated sites, but the pace of change 
being what it is there is no guarantee that they will still be there and still be useful by the time 


this is read! 


Data sets 


hitp://lib.stat.cmu.edu/ DASL/ 
DASL: The Data and Story Library (USA), categorised by topic. 


http://www.maths.ug.edu.au/~gks/ data 
OzDASL (University of Queensland, Australia). Australian version of the above. 


http://forum.swarthmore.edu/workshops/sum96/ data.collections/datalibrary/ 
The Data Library (from the Math Forum, USA) 


bttp://www.ni.com.au/mercury/mathguys/mercindx.bim 
Chance and Data (from Tasmania, Australia) 


https! [lottery.merseyworld, com/ 
he UK Lottery Web Site — includes statistics from all the draws. 


bttp://www.fa-premier.com/results/ 
UK Premier Football results and statistics 


bttp://www.stats.ox.ac.uk/link q 
ww. stats.ox.ac, 's/schoenfield.btm 
Schoenfield’s List of Data Archives (Oxford University) 


bitp://sunsite.ncedu/ lunarbin/worldpop 
emographic statistics (including up to the minute world population) 


bttp:// www.nist. gov! itl/div898/strd/ 
US Statistics Reference Database 


bttp://www.statistics.gov.uk 
UK National Statistics 


‘p://www, Un. org/ us YOerOChO 's/infonation/e_infonation.htm 
Un, P SC: bt 
htt; if bs, berSch olBus/i: tion/e_i if tion, 


Teaching statistics 


http://www.rss.org.uk 
Royal Statistical Society 


bttp:/, / science.ntu.ac.uk/rsscse/TS/ 
Teaching Statistics magazine ~ home page 


http://; www.stats. gla.ac.uk:80/cti/ 
CTI Statistics (changing to LTSN-CMSOR) 


ttp:// www. kuleuven.ac. be/ucs/java/index.htm 
JAVA Statistics - some fascinating ‘applets’ from Belgium 


http://surfstat.newcastle.edu au/surfstat/main/surfstat.html 
Australia - onl ne text. An introductory course by Annett Dobson et al. » Newcastle University 


http://cast.massey.ac.nz 
CAST: Computer Assisted Statisti i i 
S01 tics Teaching [registratio i ~ 
Doug Stirling, Massey University, Palmerston North, New ed: ee 


http://193.61.107.61/volume 
DISCUSS statistics teaching resources 


An in portant contribution to ¢! € Or propability a: 
ion to the und rstanding of babili d isti 
FS t ‘ee 3 p Yy and statistics from the team at 


ete course, by 


http://www.mis.coventry.ac.uk/ ~styrrell/resource.btm 


Personal selection of statistical web i 
Se web resources from Sidney Tyrrell, Coventry University, 


Getti 


ng all this into word 


Any text or graphics can be copied straight into Word. Simply 


1. 
2. 
3% 
4, 


It is often better to paste into a text box, 
positioning. Note: any internet links (underline 


Mark any text you want, or hover over any graphic you want. 
Right-click ‘Copy’ (or Ctrl-C) 
Click where you want to insert it in Word 


Right-click ‘Paste’ (or Ctrl-V) 
so you have more control over the layout and 
d in blue) will be copied too. 


Copying a data set: 


If data is presented on the web page in columns, 


( 


Pasting into Word should therefore also 


it should copy and paste in TAB-separated 


.tsv) or COMMA-separated (.csv) format. 
put it in columns. You may need to adjust the TABS 


settings to suite the data once it is in. 


Copying into EXCEL can be | 
column. If this happens, use th 


ess successful, with all the data often ending up in the first 
e ‘text to columns’ feature in the ‘DATA’ menu, and try ‘Tab’, 


‘Comma’ or ‘Space’ until it works. Alternatively try 
Edit = Paste special and select ‘Text’. 


USING AUTOGRAPH 


at operates in both bivariate and single variable 


modes. In the bivariate mode, as well as a full range of equations and coordinate geometry 
operations, data sets can be represented as scatter diagrams. In the single-variable mode, data 
can be displayed in all the usual diagrams, and probability distributions can be drawn. A 
variety of on-screen calculations are available. 


Autograph is a dynamic graphing package th: 


Many of these operations can also be created very effectively on a spreadsheet, and 
throughout this supplement both approaches are explored. 


Bivariate data 


by the 


In Autograph, the word ‘cursor’ is used to describe a coordinate point that is added 
user, either by ‘point and click’ or entering coordinates directly. 

Most operations are available on the button bar, or through the right-click menu. This is 
dependent on the selection of objects that has been made, and standard rules for object 


selection are used. 


A bivari 
iate data set can be created in various wa 
ys: 


(a) By adding ‘cursors’, perhaps in a 
(e.g. mostly in a well-correlated I 
moved around at wii 


pattern that will hel 


ine, but with one outlier 


Il subsequently. ). Cursors can of course be 


Use the r ht-c! ick t and ‘Convert to data set 
he 1g) Li options: ‘Select all cursor: 
Ss rm data set’. 


This will chan; 
This \ ge the cursors into a sinel j 
individual cursor around if you hold mo POU cel eas 


You can double-click on 


an i 
id y one cursor in the data set to open the 


(Edit Data Set 


(b) By baie the Edit Data Set dialo 
imported by loading a CSV fi 
spreadsheet. . ii 


gue box: here data can be entered direct! 
(comma-separated) 


1 y in pairs, 
» or pasted in from two columns in a 


‘ata can then ortes Y x or by y), scale y any tormula, or sw: pped over. Tic 
D. then be s rted (b b }, scaled b f i r swa d k 
Ti 


Show Statistics’ to c ‘eate Ci inked set of res ts, w! in 
reate a dynamicall: link 
; ; Y é y ul 's, Which change if any points 


p make a particular teaching point 


‘Edit data set’ dialogue 


: Unit: [1 


A Grouped Data Set c be defined either y its class intervals and freque 
ip D: S an by it : q 
8 < g 
under ying a raw data set. Plotting can treat the data either as continuous 


Select Distr. | 


as - Create Sample ae 


‘Recall | 
Clear Data | 


a 


| eortesy | 


Export CSV 


= = 
Ww : e enter i i Vv i ted or asted from a 
A data set can be en ed directly by t ping in the alues, impor Pp 
ra a ly Dy ty iP 


spreadsheet, or created by sampun, Tour robab y distribution. 
dsheet. ted b i g fi ap ity distrib 
> 


ncies, or by an 
or discrete. 


CHAPTER 1 REPRESENTATION AND SUMMARY OF DATA 


Data sets come in all shapes and sizes these days. Computers can make light work of 
Presenting data in a digestible form, but users need to take care to use the right tool for the 
job. 


Using a spreadsheet 


The following functions are relevant: 


ax SUM(array) 

dfe SUMPRODUCT(array, atray) 

Df? SUMPRODUCTiarray, array, array) 

m= (1/n) Sx AVERAGE(array) 

(1/12) D(x) — mn? STDEV(array) 

n COUNT(array) 
COUNTIF(array, test) 

kth smallest SMALL(array, k) 

kth largest LARGE(array, k) 

Minimum MIN(array) 

Maximum MAX{array) 

Mode MODE(array) 

Median MEDIAN(array) 

Quartiles QUARTILE (array, q) 


q=0= Min, q=1: LQ, = 2: Median, 
4 = 3: UQ, ¢=4: Max 


valculating frequencies: 


f= FREQUENCY (array1,array2)] 

To get this to work you need: 

(a) array1 with all the data ina single column 

(b) array2 called the ‘bin’ array, listing the tight hand ends of the classes, e.g. 20, 40, 60, 80 
(c) array3 (empty) marked where you want the frequencies to go. 


This operation then returns array of frequencies for < 20, < 40, < 60, < 80 and also > 80, but 
you need to have marked an array first ready to receive this information. (Note, it is one more 
cell than the ‘bin’ array-2). This last cell is optional. 


NOTE: this formula is generating an array, Excel Tequires that you press SHIFT-CTRL-ENTER 
when you have finished editing the formula: this puts curly brackets round the formula. 
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To draw a histogram in Excel: acd 
i i t. 
This is related to the frequencies function above, but can also run without i 


i i en. 
This is not really a histogram as it only works if the classes are ev 


= Data Analysis = Histogram 
Hie Range = raw data array (which can run over several columns) 
i | limits 
Bin Range = array of upper class interval 
oan cate = array to place the resulting frequency column 


“ Output’ to draw the histogram fai sates tae 
oe e he shaded histogram section ‘Format Data Series’ = Options = Set Gap 


Width to zero. 
Histogram 
oy 
5 
r=] 
3 
rm 
Using Autograph 


With a grouped data set entered (with or without underlying raw data) you can draw: 


e Histogram (see next page) . 
© Cumulative frequency diagram (frequency or percentile scale) 


400 


80 


60 


40 


20 


On-screen measurements enab 


40 


50 


60 


70 


e quartiles and the median to be measured 


e Box and whisker diagram 
@ Dot plot 


40 


| ie ! 
80 $0 iho 
The dot plot is useful for showing where the raw data 


2 [ points actually are, especially when 
drawn at the same time as another diagram, 


e.g. a box and whisker or a histogram. 
. eet, 
@ Numerical statistics can be generated as text: 
{a) summary statistics (mean, mode, quartiles, SD, range, ete, for raw data and for grouped) 
(b) tabulated results, including mid-interval value 


(c) stem and leaf diagram (really only works for discrete integer data) 


An example of a stem and leaf diagram, generated as text in the ‘Results Box’. 
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Illustrating discrete and continuous data on a histogram 


The x-axis on most Statistical diagrams is a conti 


: 1 nuous scale. Therefore it is import that 
discrete data is represented correctly. 


The grouped data entry box in Autograph requires that the d. 


: : ata is represented as continuous 
or discrete. The ‘unit’ should be set = 1 for integers, or 0.1 


if data is ‘to the nearest 0.1’, etc, 


628 A 


A discrete data item, e.g. 53 (unit = 1) will be represented on the histogram as a region from 
52.5—53.5. Similarly a class interval 20-29 is represented by the region 19.5-29.5, i.e. 20-30 
shifted to the left by 0.5 (half the unit). 


f \ i 


i 
et pec 


+X 
30 40 2 


Cumulative frequency, frequency polygon and box and whisker diagrams are similarly 


displaced. 


In the table of values option, the mid-interval value is used to calculate the mean and SD, and 
will take account of the nature of the data: 


Continuous: 


Class 
Interval 


0-20 
20-30 
30-40 
40-45 
45-50 
50-100 


Mean = 46.84 SD = 15.29 


Discrete: 


Class 
Interval 


0-19 
20-29 
30-39 
40-44 
45-49 


Mid-interval Class Frequency Cumulative 
Value (x) Width (f) Frequency 
10.0 20 0 0 
25,0 10 10 10 
35.0 10 130 140 
42.5 5 90 230 
47.5 5 5S 285 
75.0 50 7S 360 
Sf=360 Yfx=16 863 Lfx?= 874 031 
Mid-interval Class Frequency Cumulative 
Value (x) Width (f) Frequency 
9.5 20 0 0 
24.5 10 10 10 
34.5 10 130 140 
42.0 5 90 230 
47.0 S 55 285 
74.5 50 75 360 


50-99 


Yf=360 “fe = 16 683 Spe? = 857 259 
Mean = 46.34 ( = Continuous mean — 0.5) SD = 15.29 (unchanged) 


| 


| 
| 


When selecting fr i ed also to specif = 
ig frequency densit ou need i : vi 
Y Ys ¥' peciy the per’ unit. The default alue 


Illustrating frequency and frequency density on ah 


istogram 


This can be a difficult conce; 0 get acros' da visua approach can be very effective. 
‘p' g S, and a visual IPP: Cc y i 


Example: Consider a rouped data set entered into Autograph def; ined by these variable-width 
: z 
& ip grap. y aria: WI! 


class intervals: 
0, 20, 30, 40, 45, 50, 100 

and the following associated frequencies: 
0, 10, 130, 90, 55, 75 


If you select to dra i 
I w a histogram from thi 
frequency’ or “frequency density’ Mee 


Frequency: 


the dialogue box asks you to choose 


EB Histooram 


Pres 7 


Here, a dat i 

- a set is clearly being mi 
‘ s 

to the final class. ‘ 


| 


repr : i 
presented: the mode is wrong and there is undue weight 


Frequency Density: 


[eOaiesai 


FI NoranalDistribttion 


[ED Histogram 
Probebitybyares 


m is a direct measurement of the frequency. 


Example: Enter the following discrete raw data: 
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 23, 24, 24, 25, 25, 27, 31 and draw a histogram using the 
unequal classes 0, 10, 40 (ie. 0-9 and 10-39). There are nine items in each class, so plotting 


frequency’ gives two equal frequencies. Here is the same data plotted as a frequency density. 
q' ys q! ‘q) P 


4 F 


a Nea AAN DOS 


CHAPTER 2 CORRELATION AND REGRESSION 


Using a spreadsheet 


The following functions are relevant to bivariate statistics: 


PEARSON(array-y, array-x) 
returns correlation coefficient, r [PMCC] 


CORREL(array-y, array-x) 
Spearman’s rank correlation coefficient 
FORECAST(x, array-y, array-x) 
estimates y(x) given the 2 arrays 


INTERCEPT (array-y, array-x) 
returns c in y-on-x regression line y = mx + ¢ 


SLOPE(array-y, array-x) = “m’ 
returns m7 in y-on-x regression line y = mx + ¢ 


DEVSQ(array-y, array-x) 
returns R?, the sum of the squares of deviations from the sample mean. 


including residuals and 


There is also a sophisticated facility to create a full regression analysis, 
normal probability plots. Use: Tools = Data Analysis > Regression 


The scatter diagram chart option: 
Y 


If you select ‘XY (Scatter)’ and create the diagram, there are a number of useful options 


available: 


Double-click on the either axis to reformat that axis 


Double-click a data point to open the ‘Format Data Series’ dialogue box 
lick = ‘Add Trend Line’ 


Click on a data-point (they should all turn yellow), then right-c 


Choose Type: linear, and 
Options: Forecast forward/back 
display equation, display R2 


BRAIN SIZE - 1Q 


1200000 
1100000 
1000000 
900000 
800000 
700000 
600000 

50 100 150 200 


¥ = 1244.3x + 770610 a 
R? = 0.1496 


An example of a data set copied off an internet 


: age i 
data: to select to two for this plot, press Ctrl as Page into Excel. There were many columns of 


you select the second (non-adjacent) column, 


Using Autograph 


With a bivariate data set in place, there are a number of o tions to he ‘ustrate it. operties, 
pl u: t of options to help illustra S$ prop 


esults - [Data Set I] 21x) 


The statistics box (tick the option in the Edit Data Set di alogue box 
results, and these will e cally if a po: t 
S ill chang dynami ally point in the 


gives all the standard 
data set is altered (hold down Ctrl 


If the ‘Junior’ option i 
Junior’ option is chosen (from View => Preferences) 


only the means and the ‘line of best fit’, » this box is simplified, and gives 


23 PROBABILITY 633 


Example: Simulation of the sum of two dice: 


A B C 
1 DICE 1 =INT(6 * RAND()) +1 
2 DICE 2 =INT(6 * RAND()} + 1 
3 SUM =C1+C2 
4 Pa 7a 
\ 5 Score 2 = CS + (BS = $C$3) 
! 6 Score 3 
7 Score 
21 Score 12 Fill down, mark B4:(2) and plot 
The logical statement in cell CS, (BS = $C$3) equals 1 if TRUE, otherwise equals 0. This is a 
simple way to add 1 to a total if a condition is met. 


However, a cell formula referring to itself is called a ‘circular reference’ and you need to set 
s y' 


: Hibyipeasing Chel pe the following to make it work in this instance: 
which can be moved aroun 


Here a set of data has a rogue point (whic dand line of best fit are also shown. Tools = Options = Calculation tab = Tick Iteration 
dragging with the mouse), and the centroid an = Max iterations = 1 (leave Max change) 
rag 


Then hold down F9 to run the simulation, 


\ 
shel 
' 


SUM of 2-DICE 


: . che 
ating to a variable straigh 


Here the principle of least squares regression 1s being illustrated, re 


line through the centroid. 


23456789 101112 


L 


CHAPTER 3. PROBABILITY 


To make the x-axis work properly in this chart, proceed as follows: 


: if robability pS Choose the ‘Column’ chart type and observe that it is plotting ‘x’ and ‘f° against the row 
Using a spreadsheet for p os Gone 
i dom numbers in Excel: 
There are various ways to create ran 


Click ‘next? = ‘Chart Source Data’ > ‘Series’, 
Random number 0<x <1 


Se RANG +1) Random integer 1 <x < : ee 
ENDAETWEN i. cert aie ick ‘Analysis Tool Select ‘x’ (which is plotting on the wrong axis). Copy and paste its array into the Category X 
No ea pentengener =a axis labels slot, then click ‘Remove’ to take it off the y-axis list. 

NOTE: If RANDBETWE oes - 


Pack’, 


Internet resources for probability 


Thi pane teaching resources Iro: Ove: University, covering many aspects 
ig 0. ig S m C ntry ny y5 ry asp 
g 
of school level proba vility and statistics. 


Simulations available include one on Buffon’s needle. 


DISCUSS B 


2. The Chance and data site from Tasman se seeeasey 
; This has an excellent probability section which links probability theory 
is has 


newspapers. 


and Data in the News 
"Main Index 


Bo : 


From the Autograph ‘extras’ 


é | jon | 
. Monte-Carlo simulatio — ee 
is si ion is based on the probability of a random point within a haere ° ae 
eeardivae unit circle. The probability is $. This simulation is very 
anding wi ; 


but is a good illustration of randomness. 


R18F 


3187 
oF 
318 


315 
a nererer eres! 


porerpirt 


peppirserirey 


M VARIABLE: 


2. Dice throwing 


This is similar to the spreadsheet example above, 


(a) the sum of 2 dice 

(b) the difference between 2 dice 
{c) throwing one die 

(d) throwing n dice 


but more automatic to use. Options include 


CHAPTERS 4 AND 5 DISCRETE RANDOM VARIABLES 
Using a spreadsheet 


The following formulae are available for generating discrete rando 
BINOMDIST(r, n, p, T) 
eg. T=0:X~B(10, 0.5) 
BINOMDIST(2, 10, 0.5, 0) = P(X =2) 
eg. T= 1: (cumulative) 
BINOMDIST(2, 10, 0.5, 1) => P(X <2) 
POISSON(x, m, T) 
e.g. T= 0: X ~ Po(4) POISSON(2, 4, 0) = P(X = 2) 
e.g. T= 1: (cumulative) POISSON(2, 4, 1) = P(X < 2) 


m variables in Excel: 


Example: To produce the distribution and the cumulative distribution for X ~ Bin(10, p) 


Name A2 ‘n’ 
Name B2 ‘p’ 


Formula in D2 = binomdist(x, n, p, 0) 
or for cumulative: = binomdist(x, n, p, 1} 


Note ‘x’ is the column heading C1 and 
this can be used in the formula. 

Enter C2 = 0 

To create 0-10 in C2-C12, 

use Edit = Fill = Series 


Fill down D2-D12 (double-click on the 
D2 cell dot) 
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STATISTICS 


4.20 
4.00 
0.80 + 
0.60 


0.40 
0.20 
0.00 


To put in a slider to control p: 

Right-click over toolbar = Forms =? Scroll Bar 
Drag it into position. Right-click: Format control. 
Unfortunately this slider only works with integers, 
slider to vary from 0-100, and set p = B10/100. 


so you need the dummy cell, B10: set the 


Using Autograph 
The following discrete probability distributions are available in Autograph: 


e Rectangular: X ~ R(a, b) r=a,...6 
P(X =r) =1f(b-a41) 
Mean, fu = (a + b)/2 
Variance, o? = (b —a)(b~ a+ 2)/12 
X ~ Bin, p) 
P(X =r) =nCr.p.q’' 
Mean, = 1p 
Variance, o” = npq 
X ~ Po() 
P(X =r) =a/rie? 
Mean, =A 
Variance, 07 =A 
P(X =r +1) =P(X=n).Alr +1) 
also the distribution Po(npq) ~ B(x, p) 
X ~ G(p) 
P(X=n=q"'p 
Mean, «= 1/p 
Variance, 07 = q/p” 
P(X=r+1)=P(X=").4 


e@ Binomial: r=0,1,2,...2 


e@ Poisson: r=0,1,2... 


e Geometric: r=1,2,3,... 


e User defined: 
Mean, = & 1.P(x = 7) 


Variance, 0? =D 1r?.P(X =r) -y? 


Seer cocm) 
| ' i | | i ! 1 
| ee | | 550 
L oe ee | 
Le ILI | 
it : 


Table of Values of Po(2.5): w= 2.5, 02 =2.5 


r P(X =r) 
0 0.08208 
1 0.2052 

2 0.2565 

3 0.2138 

4 0.1336 

5 0.0668 

6 0.02783 
ye 0.009941 
8 0.003106 


P(X <r) 
0.08208 
0.2873 
0.5438 
0.7576 
0.8912 
0.958 
0.9858 
0.9958 
0.9989 


P(X 2 r) 

1 

0.9179 | 
0.7127 
0.4562 
0.2424 
0.1088 
0.04202 
0.01419 
0.004247 


CHAPTERS 6, 7 AND 8 
’ CONTINU 
NORMAL DISTRIBUTION OUS DISTRIBUTIONS AND THE 


Using a spreadsheet 


X~ Nim, s*) 


With T= 0 this retur: 
i ns the value of th 
With T= 1 this returns P(X <x) : 


1 ‘ “ie 
he following normal distribution formulae are available in Excel: 
NORMDIST(x, m, s, T) 7 


pdf 
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NORMINV(p, 1, 5) 
For X ~ N(m, s°) 7 
this returns x such that P(X <x)=p 


ORMDIST(z) 
oh Z~N(O, 1), this returns P(Z <2) 
where Z = (X — m)/s 
NORMINV(p) 
returns z such that P(Z <z)=p 
STANDARDIZE a ™M, $) 

oe : i a5 nd 

eee ie ‘ ter used in the normal distribution formula is standard deviation (s) a’ 
NOTE: the paramete 


not variance (s?). 


oo 
 (x-m}s} sommes anatmonslS 


| b6 named "y 
2.000. {y-m)/s} 


0.841 1-8 
_ 0.977) NORMDISTiy,m.s,1) 
| 0.023, 1-b10 


0.819, 
0.181) 1h 
~2.5| b16 named "p" _| 
304.004, NORMINV{p/100,m,s} | 
4.960, (b18-mys) 
15.996 NORMINV(1-p/'100,m,s) if 
Ce 


| 
d to create a genera 
formulae have been use 

MDIST and NORMINV 

In the example, the NOR. 


Normal Distribution calculator. 


i xtensive 
Vi . Notice the € 
| explains the entries and formulae that ha e€ been used 
The ‘Cc’ column 


ore friendly. 
f named cells — this makes all subsequent formulae so much mi 
use 0: 


STRIBUTIONS AND THE NORMAL DISTRIBUTION 639 


Using Autograph 
The following continuous probability distributions are available in Autograph: 
e Uniform: X ~ Ula, b) aSx<ebh 


fx) = 1/(b a) 
Mean, w= (a+ b)/2 
Variance, o? = (q — 6)7/12 
@ Normal: X ~ Niu, 0?) 
z=(x-m)/o f(x) = I/(oV(2s2)).e*(-1 22) 
e@ User defined: 
Mean, us =] x.f(x) dx 
Variance, o? =f x7. fl) dx — yu? 
Example: A continuous function f(x) =x?, -2<x 2 


The important Principle to appreciate is that th 


e total area must = 1, Therefore the function to 
be plotted must be f(x) = kx?, where k =| x2 


dx over the range. 
Autograph automatically converts any f(x) entry to k.f(x) 


screen by entering limits, By dragging the limits around, j 
and so areas represent probability. 


ae 5 
0.97 ia 

L oat] 
| 


» and areas can be measured on- 
t can be seen that the total area = 1, 


| 
| 
| . i 
| 
if 


i 
-2 -1 


Example: X ~ N(500, 1007): here areas between limits 


and inverse calculations are possible. 
The parameters w and o? can al 


so be varied dynamically. 


rae ees ae [_N¢600, 00%) | N(500, 100%) 
mee Dino 
oder rly HI eae 


ee 
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CHAPTER 9 SAMPLING AND ESTIMATION 


Using a spreadsheet 


r of 
data from a number o 
Excel includes a feature which can generate a sample of random 
xcel ii i can generate 
distributions. The choice of distributions is: 


Uniform (a, b) [equivalent to RANDBETWEEN(a, b)) 
Normal (m, s) 

Bernouilli (p) 

Binomial (7, p) 

Poisson (m), 7 

User-de ‘ined Discrete [2 cols: x, P(X = x)] 


eo oo 


andom sample 
6| | es 


“Sample Meat 
“16.20! 4.60! 5,40/5.80) 
@ means: mean = |6.43| 
Sample means: mean 43} 
Sample means: SD = |0.70 


Example: To create a large sample data set from B{10, 0.5 and take samples of size 5 from it. 
> > ip. 


i eration. Leave 
ieve the above, use Tools = Data Analysis > Random Nati oe sae ra 
i. ie Variables? (=columns) and ‘Number of Random ot RRO ha 
Nap on ‘Output Range’ and drag out the oe ne Nan cee tae aa 
Te i ial’ ; lue’ and ‘Number . Leave a 
‘ ial’, and enter ‘p Value sramnenees 
aan Coa exactly the same set of random numbers agai 
ank (this 
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Click ‘OK’, Then to create a frequency chart first set up a ‘bin arra 
0-10, and use the formula {= FREQUENCY (data array, 


bin array. Don’t forget SHIFT-CTRL-ENTER! Select the 
and draw a bar chart 


yas a column of figures 
} on a new array next to the 
bin array and the frequency array 


bin array 


To create a random sample from this sample, use Tools = Data Anal 
‘Input range’ is the data. Use ‘Random’ with 2 = 5, and mark the out 
there is no easy way to create many such samples. 


ysis > Sampling. The 
put range. Unfortunately 


After five such samples it is useful to compare the mean and SD of the sample means with yz 
and o/Nn calculated from the original data set, 


Using Autograph 


Use New Statistics Page = Add Grouped Data > Use Raw Data = Edit Raw Data Select 
Distribution. There is the option to create a set of random data from the following probability 
distributions: 


Rectangular (a, b) - discrete or continuous 
Binomial (n, p) 

Poisson (A) 

Geometric ( p) 

Normal (u, a?) 

User defined continuous f(x) 


(‘User defined discrete’ is not 


yet implemented] 
Use ‘Edit Distrib.’ to enter th 


€ parameters 
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Enter the sample size N, and press ‘Create Sample’. Click ‘OK’, then ‘Suggest Intervals’ 
(amend if necessary), then click ‘Continuous’ or ‘Discrete’ as appropriate, then ‘OR’. 


With a data set in place, select ‘histogram’ then ‘autoscale’. Then choose ‘Sample Means’. 
In the ‘Edit Sample Means’ dialogue box, enter the sample size (e.g. 7 = 5). You can then 


(a) take samples one at a time, in which case the actual samples are indicated on the diagram 
together with their mean. 


(b) take many samples (e.g. 100), in which case a dot plot is created. 


The Central Limit Theorem is very effectively demonstrated with almost any parent 
population. 


CHAPTERS 10 AND 11 HYPOTHESIS TESTS 


Using spreadsheets 


Using Excel, the following formulae are useful when investigating hypothesis tests: 


BINOMDIST(r, , p, T). e.g. for X ~ B(10, 0.5) 
T = 0: BINOMDIST(2, 10, 0.5, 0) = P(X = 2) 
T = 1: BINOMDIST(2, 10, 0.5, 1) = P(X < 2) 


CRITBINOM(n, p, test). e.g. for X ~ B(n.p) 
This finds the smallest x such that P(X <x) > test 


POISSON(x, m, T). e.g. for X ~ Po(4) 
T = 0: POISSON(2, 4, 0) = P(X = 2) 
T= 1: POISSON(2, 4, 1) = P(X <2) 


NORMDIST(sx, 7, s, T). e.g. for X ~ N(m, s*) 
T= 0 => value of the pdf (for plotting the curve) 
T=19P(X <x) 


NORMINV(p, 71, s). ¢.g. for X ~ N(m, s?) 
This returns the value x such that P(X <x) =p 


NORMSDIST(z) 

returns probability P(Z < z) 
NORMSINV(p) 

returns z such that P(Z<z)=p 
STANDARDIZE (x, 17, s) =? = (x — m)./s 


Example: A tough driving examiner claims to pass only 20% of his candidates. After 50 tests, 
what is the smallest number of passes required to refute this claim at 5%? 

Answer: critbinom(25, 0.2, 0.05} = 2 

This means P(X <2) >5% 

whereas P(X <1)<5% 
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aoa CONFIDENCE(a, Ss, 2) 

2 ue the confidence intervals for a sample size of n fi 

: af irate level = a. Unfortunately, ‘a’ is a meas f th >robabil 

‘0 for 95% confidence, a= 1 — 95/100. eee ae 
oe: CONFIDENCE(1 — 95/100, 2.5, 50) = 0.69 

C.L = sample mean + 0.69 at 95% : 
Example: CHITEST (arr 
: ay-1, a ~ 

array-1 = actual frequencies ae 
array-2 = expected frequencies 
returns the 7” calculation for the two arrays 


This enables a set of actual data (frequencies) 


. os t i ; 
an underlying probability distribution, 0 be tested against frequencies calculated from 


Using Autograph 


Example: is testi 
‘ample: Hypothesis testing on discrete probability distributions 


X= B25, 0.2)_ 


P(X = 8) = 10.9% 
4.7% 


Here Hy is p = 0.2 under B(25, 0.2) and H, 


limits can be dragged up and down the x- 8 p> 0.2. Ifx 29 Hp.is rejected. The boundary 


axis. 


Example: i i i 
ple: Hypothesis testing on continuous probability distributions 


o2ztp 


: (23:68) 8.73) 
-| P(XS 18.82) = 0:05 = 5951 

~ FHI NG847, 4.18) 

P(XS 18.82) = 0.975 = 


: EYPE: #4 error. | 
va] j 

. ecco 

=a — | 


when Hy is co. j 
‘ 9 rrectly rejected) and T: 

col e 2 
using two normal distributions. eso 


= aD 
This is an illustration of Type 1 error ( 
Hp is accepted but H, is true) 


(when 


population with SD = Ss, ata 
ty outside the intervals, 


EL STATISTICS 
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Example: Fitting a probability distribution to a data set. seins tea sd 
h will find the best parameters to fit a binomial, poisson or no 
Autograph wi 


> ), . 
First draw a histogram then a normal distribution (any parameters then chose fit to data 
bea good fit if the frequency density scale 


aiee ; a 

ility distribution will only appear os 
Ome — this way the total area on both diagrams = 1 

unit = ag 


Appendix 


CUMULATIVE BINOMIAL PROBABILITIES 


ort 


= 0.05 0.10 


0.15 


0.20 


The tabulated value is P(X <r) where X ~B(n, p) 


0,30 


0.35 


0.40 


0.45 


0.50 


n=2 r=01/ 0.9025 | 0.8100 
0.9975 | 0.9900 


0.7225 | 


0.9775 
1.0000 


0.6400 
0.9600 
1.0000 


0.4900 
0.9100 
1.0000 


0.4225 
0.8775 
1.0000 


0.3600 
0.8400 
1.0000 


0.3025 
0.7975 
1.0000 


0.2500 
0.7500 
1.0000 


1 
2 | 1.0000 | 1.0000 
0 


0.8574 | 0.7290 
0.9928 | 0.9720 
0.9999 | 0,9990 


0.6141 
0.9393 
0.9966 
1.0000 


1 

2 

3 | 1.0000 | 1.0000 
0 


0.8145 | 0.6561 
0.9860 | 0.9477 
0.9995 | 0.9963 
1.0000 | 0.9999 
1.0000 


— 


0.5220 
0.8905 
0.9880 
0.9995 
1.0000 


0.5120 
0.8960 
0.9920 
1.0000 


0.4096 
0.8192 
0.9728 
0.9984 
1,0000 


0.3430 
0.7840 
0.9730 
1.0000 


| 0.2401 | 


0.6517 
0.9163 
0.9919 
1.0000 


0.2746 
0.7183 
0.9571 
1.0000 


0.1785 
0.5630 
0.8735 
0.9850 
1.0000 


0.2160 
0.6480 
0.9360 
1.0000 


0.1296 
0.4752 
0.8208 
0.9744 
1.0000 


a=S ors 0.7738 0.5905 
0.9774 | 0.9185 
0.9988 | 0.9914 
1.0000 | 0.9995 
1.0000 


0.7351 | 0.5314 
0.9672 | 0.8857 
0.9978 | 0.9842 
0.9999 | 0.9987 
1.0000 | 0.9999 


0.4437 
0.8352 
0.9734 
0.9978 
0.9999 
1.0000 


0.3771 
0.7765 
0.9527 
0.9941 
0.9996 
1.0000 


0.3277 
0.7373 
0.9421 
0.9933 
0.9997 
1,0000 


0.1681 
0.5282 
0.8369 
0.9692 
0.9976 
1.0000 


0.1160 
0.4284 
0.7648 
0.9460 
0.9947 
1.0000 


0.0778 
0.3370 
0.6826 
0.9130 
0.9898 
1.0000 


0.1664 
0.5748 
0.9089 
1.0000 


0.0915 
0.3910 
0.7585 
0.9590 
1.0000 


0.0503 
0.2562 
0.5931 
0.8688 
0.9815 
1.0000 


0.1250 
0.5000 
0.8750 
1.0000 


0.0625 


0.3125 
0.6875 
0.9375 
1.0000 


0.0313 
0.1875 
0.5000 
0.8125 
0.9688 
1.0000 


0.2621 
0.6554 
0.9011 
0.9830 
0.9984 
0.9999 
1.0000 


0.1176 
0.4202 
0.7443 
0.9295 
0.9891 
0.9993 
1.0000 


0.0754 
0.3191 
0.6471 
0.8826 
0.9777 
0.9982 
1.0000 


0.0467 
0.2333 
0.5443 
0.8208 
0.9590 
0.9959 
1.0000 


0.0277 
0.1636 
0.4415 
0.7447 
0.9308 
0.9917 
1.0000 


0.0156 
0.1094 
0.3438 
0.6563 
0.8906 
0.9844 
1:0000 


ls -0000 


0.6983 | 0.4783 
0.9556 | 0.8503 
0.9962 | 0.9743 
0.9998 | 0,9973 
1.0000 | 0.9998 
1.0000 


0.3206 
0.7166 
0.9262 
0.9879 
0.9988 
0.9999 
1.0000 


0.2097 
0.5767 
0.8520 
0.9667 
0.9953 
0.9996 
1.0000 


0.0824 
0.3294 
0.6471 
0.8740 
0.9712 
0.9962 
0.9998 
1.0000 


0.0490 
0.2338 
0.5323 
0.8002 
0.9444 
0.9910 
0.9994 
1.0000 


0.0280 
0.1586 
0.4199 
0.7102 
0.9037 
0.9812 
0.9984 
1.0000 


0.0152 
0.1024 
0.3164 
0.6083 
0.8471 
0,9643 
0.9963 
1.0000 


0.0078 
0.0625 
0.22.66 
0.5000 
0.7734 
0.9375 
0.9922 
1.0000 


0.6634 | 0.4305 
0.9428 | 0.8131 
0.9942 | 0.9619 
0.9996 | 0.9950 
1.0000 | 0.9996 
1.0000 


0.2725 
0.6572 
0.8948 
0.9786 
0.9971 
0.9998 
1.0000 


0.1678 
0.5033 
0.7969 
0.9437 
0.9896 
0.9988 
0.9999 
1.0000 


0.0576 
0.2553 
0.5518 
0.8059 
0.9420 
0.9887 
0.9987 
0.9999 
1.0000 


0.0319 
0.1691 
0.4278 
0.7064 
0.8939 
0.9747 
0.9964 
0.9998 
1.0000 


0.0168 
0.1064 
0.3154 
0.5941 
0.8263 
0.9502 
0.9915 
0.9993 
1.0000 


0.0084 
0.0632 
0.2201 
0.4770 
0.7396 
0.9115 
0.9819 
0.9983 
1,0000 


0.0039 
0.0352 
0.1445 
0.3633 
0.6367 
0.8555 
0.9648 
0.9961 
1.0000 


CUMULATIVE BINOMIAL PROBABILITIES 


The tabulated value is P(X <r} where X ~B(n, p) 


= 0.05 | 040 | O15 | 0.20 | 0.25 | 0.30 | 0.35 | 0.40 0.45 | 0.50 
Tad 720 10.6302 | 0.3874 | 0.2316 | 0.1342 | 0.0751 | 0.0404 | 0.0207 0.0101 | 0.0046 | 0.0020 
1 | 0.9288 | 0.7748 | 0.5995 } 0.4362 | 0.3003 | 0.1960 | 0.1211 ) 0.0705 0.0385 | 0.0195 
2 | 0.9916 | 0.9470 | 0.8591 | 0.7382 | 0.6007 | 0.4628 | 0.3373 | 0.2318 0.1495 | 0.0898 
3 | 0.9994 | 0.9917 | 0.9661 | 0.9144 | 0.8343 | 0.7297 | 0.6089 | 0.4826 0.3614 | 0.2539 
4 | 1.0000 | 0.9991 | 0.9944 | 0.9804 | 0.9511 | 0.9012 | 0.8283 | 0.7334 0.6214 | 0.5000 
5 0.9999 | 0.9994 | 0.9969 | 0.9900 | 0.9747 | 0.9464 | 0.9006 | 0.8342 0.7461 
6 1.0000 | 1.0000 | 0.9997 | 0.9987 | 0.9957 | 0.9888 | 0.9750 | 0.9502 0.9102 
7 10000 | 0.9999 | 0.9996 | 0.9986 | 0.9962 } 0.9909 } 0.9805 
8 1.0000 | 1.0000 | 0.9999 | 0.9997 | 0.9992 | 0.9980 
9 1.0000 | 1.0000 | 1.0000 | 1.0000 
qeI0 720 | 0.5987 | 0.3487 | 0.1969 | 0.1074 | 0.0563 | 0.0282 | 0.0135 0.0060 | 0.0025 | 0.0010 
1 | 0.9139 | 0.7361 | 0.5443 | 0.3758 } 0.2440 | 0.1493 | 0.0860 | 0.0464 0.0233 | 0.0107 
2 | 0'9885 | 0.9298 | 0.8202 | 0.6778 | 0.5256 } 0.3828 | 0.2616 | 0.1673 0.0996 | 0.0547 
3 | 0.9990 | 0.9872 | 0.9500 } 0.8791 | 0.7759 | 0.6496 | 0.5138 0.3823 | 0.2660 | 0.1719 
4 | 0.9999 | 0.9984 | 0.9901 | 0.9672 | 0.9219 | 0.8497 | 0.7515 | 0.6331 0.5044 | 0.3770 
5 | 1.0000 | 0.9999 | 0.9986 | 0.9936 | 0.9803 | 0.9527 | 0.9051 | 0.8338 0.7384 | 0.6230 
6 1.0000 | 0.9999 } 0.9991 | 0.9965 | 0.9894 | 0.9740 | 0.9452 | 0.8980 0.8281 
7 1.0000 | 0.9999 | 0.9996 | 0.9984 | 0.9952 | 0.9877 | 0.9726 | 0.9453 
8 1.0000 | 1.0000 | 0.9999 | 0.9995 | 0.9983 | 0.9955 | 0.9893 
9 1.0000 | 1.0000 | 0.9999 | 0.9997 | 0.9990 
10 1.0000 } 1.0000 | 1.0000 
1 i £ 2 tl Be 
n=15 7=0 | 0.4633 | 0.2059 | 0.0874 | 0.0352 | 0.0134 ) 0.0047 0.0016 | 0.0005 | 0.0001 | 0.0000 
1 | 0.8290 | 0.5490 | 0.3186 | 0.1671 | 0.0802 | 0.0353 | 0.0142 | 0.0052 0.0017 | 0.0005 
2 | 0.9638 | 0.8159 | 0.6042 | 0.3980 | 0.2361 | 0.1268 | 0.0617 | 0.0271 0.0107 | 0.0037 
3 | 0.9945 | 0.9444 | 0.8227 | 0.6482 | 0.4613 } 0.2969 | 0.1727 | 0.0905 0.0424 | 0.0176 
4 | 9.9994 | 0.9873 | 0.9383 | 0.8358 | 0.6865 | 0.5155 | 0.3519 | 0.2173 0.1204 | 0.0592 
§ | 0.9999 | 0.9978 | 0.9832.} 0.9389 | 0.8516 | 0.7216 | 0.5643 | 0.4032 0.2608 | 0.1509 
% | 1.0000 | 0.9997 | 0.9964 | 0.9819 | 0.9434 } 0.8689 | 0.7548 | 0.6098 0.4522 | 0.3036 
7 1.0000 | 0.9994 | 0.9958 | 0.9827 | 0.9500 | 0.8868 | 0.7869 | 0.6535 0.5000 
8 0.9999 | 0.9992 | 0.9958 | 0.9848 | 0.9578 | 0.9050 | 0.8182 | 0.6964 
9 1.0000 | 0.9999 | 0.9992 | 0.9963 | 0.9876 | 0.9662 | 0.9231 | 0.8491 
10 1.0000 | 0.9999 | 0.9993 | 0.9972 | 0.9907 | 0.9745 | 0.9408 
1 10000 } 0.9999 | 0.9995 | 0.9981 ] 0.9937 | 0.9824 
12 1.0000 | 0.9999 } 0.9997 | 0.9989 | 0.9963 
13 71,0000 | 1.0000 | 0.9999 | 0.9995 
14 4.0000 | 1.0000 
n=20 r=0 | 0.3585 | 0.1216 | 0.0388 0.0115 | 0.0032 ] 0.0008 | 0.0002 0.0000 | 0.0000 } 0.0000 
1 | 0:7358 | 0.3917 | 0.1756 | 0.0692 | 0.0243 | 0.0076 | 0.0021 | 0.0005 0.0001 | 0.0000 
2) 0.9245 | 0.6769 | 0.4049 | 0.2061 } 0.0913 | 0.0355 | 0.0121 | 0.0036 0.0009 | 0.0002 
3 | 0.9841 | 0.8670 | 0.6477 | 0.4114 | 0.2252 | 0.1074 | 0.0444 | 0.0160 0.0049 | 0.0013 
4. | 0.9974 | 0.9568 | 0.8298 | 0.6296 | 0.4148 | 0.2375 | 0.1182 } 0.0510 0.0189 | 0.0059 
§ | 0.9997 | 0.9887 | 0.9327 | 0.8042 | 0.6172 | 0.4164 | 0.2454 | 0.1256 0.0553 | 0.0207 
z | 1.0000 | 0.9976 | 0.9781 | 0.9133 | 0.7858 | 0.6080 | 0.4166 | 0.2500 0.1299 | 0.0577 
7 0.9996 | 0.9941 | 0.9679 | 0.8982 | 0.7723 | 0.6010 | 0.4159 | 0.2520 0.1316 
8 0.9999 | 0.9987 | 0.9900 | 0.9591 | 0.8867 | 0.7624 | 0.5956 | 0.4143 0.2517 
9 0000 | 0.9998 | 0.9974 | 0.9861 | 0.9520 | 0.8782 } 0.7553 0.5914 | 0.4419 
10 1.0000 | 0.9994 | 0.9961 | 0.9829 | 0.9468 | 0.8725 | 0.7507 0.5881 
14 0.9999 | 0.9991 | 0.9949 | 0.9804 | 0.9435 | 0.8692 0.7483 
42 10000 | 0.9998 | 0.9987 | 0.9940 | 0.9790 | 0.9420 0.8684 
13 1.0000 | 0.9997 | 0.9985 | 0.9935 | 0.9786 0.9423 
14 10000 | 0.9997 | 0.9984 | 0.9936 | 0.9793 
15 1.0000 | 0.9997 | 0.9985 | 0.9945 
16 10000 | 0.9997 | 0.9987 
17 7.0000 | 0.9998 
18 4.0000 


CUMULATIVE POISSON PROBABILITIES 


The tabulated value is P(X < r) where X ~Po(A) 


» 
i 


| 0.2 


0.4 0.5 0.6 0.8 
: ‘ 1.0 12 
r=0| 0.8187 | 0.6703 | 0.6065 | 0 = = 
| o.sts7 | 0.6703 |. 5488 | 0.4493 | 0.3679 | 03012 
2] ses | Sau | Gate | Sam | alee | sh | Rae | 858 | 8 
.9999 | 0.9992 ; ; Seip oboe . 
3 | 0.9599 | 0.9992 0.9982 0.9966 | 0.3909 | 0.9810 Wack aes Dead 
4 05999 | 0.9998 | 0.9996 | 0.9986 | 0.9963 | 0.9923 oes | opeis 
5 ; 1.0000 | 0.9998 | 0.9994 | 0.9985 | 0.9968 begs 
é 1.0000 | 0.9999 | 0.9997 | 0.9994 heen 
? 1.0000 } 1.0000 | 0.9999 | 0.9098 
1.0000 | 1.0000 
A= 
eae ie ; an 2.0 22 2A 25 26 2.8 3.0 
=0 fo. 0.1353 | 0.1108 | 0.0907 
0 | o201s | O46 ; 0.0821 | 0.0743 
s eset | Gast | nate | gone | Saou | take | i | ai | oa 
9212 | 0.89 : : rs 
3 | osai2 | o.so13 0.8571 0.8194 | 0.7787 | 0.7576 | 0.7360 peels Here 
$ | 03563 | 93836 | 09473 | 0.9275 | 0.9041 | 0.8912 | 0.874 | 0.8477 0.8153 
€ [oissad | 22895 | 09834 | 0.9751 | 0.9643 | 09580 | 0.9510 | 0.9349 0.916 
S$ | 0.5387 | 9.9974 | 0.9955 | 0.9925 | o9ss4 | o98se | 0.9828 | 0.9756 oe 
a | Sg38% | 09924 | 0.9989 | 0.9980 | 0.9967 | 09958 | 0.9947 osois | ogee 
8 0.9999 | 0.9998 | 0.9995 | 0.9991 | 0.9989 | 0.9985 0.9976 | 0. 
2 0.9999 | 0.9998 | 0.9997 | 09996 | g.9993 | voocs 
10 1.0000 | 1.0000 | 0.9999 | 0.9999 | o'se03 | goose 
1 1.0000 | 1.0000 | 1.0000 | 0.9999 
| 1.0000 
A= 
en a : = 3.5 3.6 3.8 4.0 45° 5.0 5.5 
0 | o.4os | 0.0334 | 0.0302 0.0273 | 0.0224 | 0.0183 | 0.0111 | 0.0067 | 0.0041 
2 | Gazz | O48 | 0.1359 | 0.1257 | 0.1074 | o.os16 | o.06it | 0.0404 0.0266 
3 | oiaoe, | 23397 | 03208 | 0.3027 | 0.2689 | 0.2381 | 0.1736 | 0.1247 0.0884 
2 | eS03s | O5s84 | os366 | osis2 | 0.4735 | 0.4335 | 0.3423 | 0.2650 0.201 
$ | gees | 27442 | 0.7254 | 0.7064 | 0.6678 | 0.6288 | 0.5321 | o.440s 03875 
g | O8286 | 98705 | o.8s76 | o.ssat | osiss | 0.7851 | 0.7029 | o.s1s0 0.5289 
$ | o35s4 | O41 | 0.9347 | 0.9267 | 09091 | 0.8893 | oss | 0.7622 0.6860 
3 | oes | 29769 | 9.9733 | 0.9692 | 0.9599 | o.s4s | 0.134 | ones | 0.809 
5 | 03343 | 29917 | sar | 0.9883 | o.ss40 | 0.9786 | 0.9597 ossi9 | ogsas 
13 | 02382 | 09973 | 0.9967 | 0.9960 | 0.9942 | 09919 | 0.9929 09682 | oiseen 
19 | 03935 | 0.9992 | 0.9990 | 0.9987 | o.seai | 0.9972 | 0.9953 babe |\<0.9747 
12 | £2393 | 9.9998 | 0.9997 | 0.9996 | 0.9994 | 0.9991 | 0.9976 0.9945 | 9.9890 
2 0.9999 | 0.99 0.9999 | 0.9998 | 0.9997 | 0.9992 | 0.9980 | posse 
13 0000 | 1.0000 } 1.0000 | 0.9999 | 0.9997 | g.9993 | goons 
4 1.0000 | 0.9999 | 0.9998 | o'9994 
i 1.0000 | 0.9999 | 0.9998 
1.0000 | 0.9999 
1.0000 


TATISTICS 


CUMULATIVE POISSON PROBABILITIES 


The tabulated value is P(X <r) where X ~Po(A) 


9.5 10.0 
6.5 ge th = = * 0,0000 
de 6.0 . : 001 | 0.0001 | 0. 
0.0002 | 0.0 
0009 | 0.0006 j 0.0003 0.0008 } 0.0005 
a oot Bare 0 0073 | 0.0047 } 0.0030 pee cies 0.0042 } 0.0028 
1 | 0. : : 0.0138 } 0. ; : e 
0296 | 0.0203 | 0. 12 | 0.0149 | 0.010 
2 | 0.0620 | 0.0430 | 0. 0.0424 | 0.0301 | 0.02 
0818 0.0591 : 550 0.0403 0.0293 
3 | 0.1512 | 0.1118 | 0! 0.0996 | 0.0744 | 0.0. 
1730 | 0.1321 | 0. 157 | 0.0888 | 0.0671 
4 | 0.2851 | 0.2237 | 0. 0.1912 | 0.1496 | 0.1 
3007 | 0.2414 | 0. 063 | 0.1649 | 0.1301 
5 | 0.4457 | 0.3690 | 0. 0.3134 | 0.2562 | 0.2 
4497 | 0.3782 | 0. 239 | 0.2687 | 0.2202 
6 | 0.6063 | 0.5265 | 0. 0.4530 | 0.3856 | 0.3 . 
0.5987 | 0.5246 | 0. 557 | 0.3918 | 0.3328 
7 | 0.7440 } 0.6728 0.5925 } 0.5231 | 0.4 
0.7291 | 0.6620 | 0. 374 | 0.5218 | 0.4579 
8 | 0.8472 | 0.7916 0.7166 | 0.6530 | 0.5 
0.8305 0.7764 . 060 0.6453 0.5830 
9 | 0.9161 | 0.8774 0.8159 | 0.7634 | 0.7 
0.9015 0.8622 . 030 0.7520 0.6968 
10 | 0.9574 | 0.9332 ogssi | 0.8487 | 0.8 
0.9467 | 0.9208 | 0. 8758 } 0.8364 | 0.7916 
11 | 0.9799 | 0.9661 0.9362 | 0.9091 | 0. 
0.9730 | 0.9573 : 261 0.8981 0.8645 
12 | 0.9912 } 0.9840 cess | 0.9486 | 09 
0.9872 | 0.9784 | 0. 9585 | 0.9400 | 0.9165 
43 | 0.9964 | 0.9929 0.9827 | 0.9726 | 0. 
conceal eA eaeieal (tone 9780 | 0.9665 | 0.9513 
14 | 0.9986 | 0.9970 0.9918 | 0.9862 | 0. 
0.9976 | 0.9954 | 0. 9889 | 0.9823 | 0.9730 
15 | 0.9995 | 0.9988 0.9963 | 0.9934 | 0. 
0.9990 | 0.9980 | 6. 9947 | 0.9911 | 0.9857 
16 | 0.9998 | 0.9996 0.9984 | 0.9970 | 0. 
0.9996 | 0.9992 | 9. 9976 | 0.9957 | 0.9928 
17 | 0.9999 | 0.9998 0.9993 | 0.9987 | 0. 
ecuicial Beak en 9989 | 0.9980 | 0.9965 
aio Bisa td 999 | 0.9997 | 0.9995 | 0. 984 
FA et aa “000 asc ees ee Beene neces 
20 ‘ 1.0000 } 0.9999 | 0. 
24 4.0000 | 0.9999 | 0.9999 | 0.9997 
2 1,0000 | 0.9999 | 0.9999 
os 1.0000 | 1.0000 
24 


THE STANDARD NORMAL DISTRIBUTION FUNCTION 


If Z has a normal distribution with mean 
value of z, the table gives the value of Di 


®(z) =P(Z < 2), 


For negative values of z use (-z) = 1- (2), 


0 and variance 1 then, for each 
z) where 


P(z) 


123 45 67 8 9 
2 3 4 5. 6 7 8 9 ADD 
0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359] 4 8 12}16 20 24) 28 32 36 
0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753] 4 8 12] 16 20 24] 28 32 36 
0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141|4 8 12/15 19 23] 27 31 35 
0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517| 4 7 11/15 19 221 26 30 34 
0.6628 0.6664} 0.6700 0.6736 0.6772 | 0.6808 0.6844 0.687914 7 11144 18 22; 25 29 32 
0.6985 0.7019} 0.7054 0.7088 0.7123 | 0.7157 0.7190 0.7224|3 7 10) 14 17 20] 24 27 31 
0.7324 0.7357} 0.7389 0.7422 0.7454 | 0.7486 0.7517 0.754913 7 10 13 16 19} 23 26 29 
0.7642 0.7673] 0.7704 0.7734 0.7764 | 0.7794 0.7823 0.7852/3 6 9 12 1S 18] 21 24 27 
0.7939 0.7967| 0.7995 0.8023 0.8051 | 0.8078 0.8106 0.8133}3 5 8/141 14 16] 19 22 28 
0,8212 0.8238] 0.8264 0.8289 0.8315 | 0.8340 0.8365 0.8389]3 5 8/10 13 15{ 18 20 23 
0.8461 0.8485] 0.8508 0.8531 0.8554 | 0.8577 0.8599 0.8621 25 719 12 14116 19 21 
0.8686 0.8708] 0.8729 0.8749 0.8770 | 0.8790 0.8810 0.8830 24 6/8 10 12] 14 16 18 
0.8888 0.8907} 0.8925 0.8944 0.8962 | 0.8980 0.8997 0.9015 24 6)7 9 41/13 15 17 
0.9066 0.9082] 0.9099 0.9115 0.9131 | 0.9147 0.9162 0.9177 23 S!}6 8 401 11 13 14 
0.9222 0,9236| 0.9251 0.9265 0.9279 | 0.9292 0.9306 0.9319 13 416 7 8/40 41 13 
0.9357 0.9370] 0.9382 0.9394 0.9406 | 0.9418 0.9429 0.944111 2 415 6 7/1 8 10 14 
0.9474 0.9484| 0.9495 0.9505 0.9515 | 0.9525 0.9535 0.954511 2 3/4 5 617 8 9 
0.9573 0.9582| 0.9591 0.9599 0.9608 | 0.9616 0.9625 0.96331}1 2 3/4 4 S| 6 7 8 
0.9656 0.9664| 0.9671 0.9678 0.9686 | 0.9693 0.9699 0.97061 1 2/3 4 4] 5 6 6 
0.9726 0.9732] 0.9738 0.9744 0.9750 | 0.9756 0.9761 0.9767 1121/2 3 444 5 5 
0.9783 0.9788} 0.9793 0.9798 0.9803 | 0.9808 0.9812 0.9817 O01 47/2 2 313 4 4 
0.9830 0.9834] 0.9838 0.9842 0.9846 | 0.9850 0.9854 0.9857 O1 4/2 2 213 3 4 
0.9868 0.9871] 0.9875 0.9878 0.9881 | 0.9884 0.9887 0.9890/0 1 1 1 2 2);2 3 3 
0.9898 0.9901] 0.9904 0.9906 0.9909 | 0.9911 0.9913 0.9916]0°1 1] 4 1 2/2 2 2 
0.9922 0.9924] 0.9927 0.9929 0.9931 | 0.9932 0.9934 0.9936]0 0 1] 1 de dy Be 2s 29) 
0.9941 0.9943] 0.9945 0.9946 0.9948 | 0.9949 0.9951 0.995210 0 O14 1 4; 1 41 
0.9956 0.9957} 0.9958 0.9960 0.9961 | 0.9962 0.9963 0.996410 0 0] 0 Ds ded Ya 3 
0.9967 0.9968} 0.9969 0.9970 0.9971} 0.9972 0.9973 0.997410 0 0} 0 oO yf ae Ea 
0.9976 0.9977! 0.9977 0.9978 0.9979 | 0.9979 0.9980 0.9981;0 0 010 0 0 oO. A. -4, 
0.9982 0.9983] 0.9984 0.9984 .09985 | 0.9985 0.9986 0.9986 | 00 0;0 0 ol o o 0 
CRITICAL VALUES FOR THE NORMAL DISTRIBUTION 
The table gives the value of z such that P(Z < z)=p, where Z ~ N(O, 1). 
Pp 0.75 0.90 0.95 0.975 0,99 0.995 | 0.9975 0.999 0.9995 
z 0.674 1.282 1.645 1.960 2.326 2.576 2.807 3,090 3,291 


650 & CONCISE COURSE IN A 


CRITICAL VALUES FOR THE t-DISTRIBUTION 


If T has a t-distribution with 7 


degrees of freedom then, 


for each pair of values of p and v, the table gives the 
value of t such that P(T < f) =p. 


0.9995 
0.90 0.95 0.975 0.99 0.995 0.9975 — 
a : 127.3 . 
; 3.078 6.314 12.71 31.82 ese ee ee 
pe aac 1.886 2.920 4.303 6.965 9.9 ; . ee sees 
; be 1,638 2.353 3,182 4,541 re a er 
i ney 1.533 2.132 2.776 3,747 4. x se 
4.773 . 
7 1.476 2.015 2.571 3.365 ore i Ree 
; Hee 1.440 1.943 2.447 3.143 3.71 : ye ae 
: oe 1415 1.895 2.365 sete ee Bae ae 
: 306 2,89 . : 
1.397 1.860 2. one 
: ee 1,383 1.833 2.262 2.821 3.250 sh ae 
0 1.372 1.812 2.228 2.764 ce oe ae 
i‘ ee 1.363 1.796 2.201 2.718 3.1 con! an 
re ae 1.356 1.782 2.179 2.681 7 Hes at 
‘ ioe 1.350 4,771 2.160 2.650 3.0 ; oe Hem 
ie Hess 1.345 1.761 2.145 2.624 2.97 e ei 
3,286 . 
91 1.341 1.753 2.131 2.602 ree oe ie 
f ae 1.337 1.746 2,120 2.583 ee on ses 
; 2.567 : ‘ 4 
1.740 2.110 eee 3 
17 0.689 1.333 eH ree ee m 
330 1.734 2. é me . 
‘s ee : 328 1.729 2.093 2.539 2.861 3 a 
as ; 3.153 F 
87 1.325 1.725 2.086 2.528 ee ae ee 
‘i - 6 1323 721, 2.080 2.518 2.83 ore ee 
a Bee 4.321 717 2.074 2.508 2.819 Sine ae 
re ae 1319 714 2.069 2.500 pe ene pai 
i Rees 1.318 711 2.064 2.492 2.79 4 
- 3.078 : 

4 316 1.708 2.060 2.485 ns a ie 
et 1315 1.706 2.056 2.479 2.77) ee in 
ii nee 1314 703 2.052 2.473 2.774 ate ce 
at pe 313 4.701 2.048 2,467 2.763 pe poet 
5a ae 4311 1.699 2.045 2.462 2.756 8 oe 
; . 

0.683 310 697 2.042 2.457 cabs 

an 0.681 303 684 2.021 2.423 a 

os 0.679 296 671 2.000 coi pee 

0 0.677 289 658 1.980 ee ee 
ie 0.674 282 645 1.960 2.32 c 


If X has a x? distribution with v d 
for each pair of values of 
of x such that P(X > x) = p 


CRITICAL VALUES FOR THE x? DISTRIBUTION 


egrees of freedom, then 
P and », the table gives the value 


Pp 0.990 0.975 0.950 0.100 0.050 0.025 0.010 
v=1 0.000 0.001 0.004 2.705 3.841 5.024 6.635 
2 0.020 0.051 0.103 4.605 5.991 7.378 9.210 
3 0.115 0.216 0.352 6.251 7.815 9.348 11.345 
4 0.297 0.484 0.711 C479 9.488 11.143 13,277 
5 0.554 0.831 1.145 9.236 11.070 12.832 15.086 
6 0.872 1.237 1.635 10.645 12.592 14.449 16.812 
7 1.239 1.690 2.167 12.017 14.067 16.013 18.475 
8 1.646 2.180 2.733 13,362 15.507 17.535 20.090 
9 2.088 2.700 3.325 14.684 16.919 19,023 21.666 
2.558 3,247 3.940 15,987 18.307 20.483 23.209 
3.053 3.816 4.575 17.275 19.675 21.920 24,725 
3.571 4.404 5.226 18.549 21.026 23,337 26.217 
4.107 5.009 5.892 19.812 22,362 24.736 27.688 
4.660 5.629 6.571 21.064 23.685 26.119 29.141 
5.229 6.262 7.261 22,307 24.996 27.488 30.578 
5.812 6.908 7.962 23.542 26.296 28.845 32.000 
6.408 7.564 8.672 24.769 27.587 30,191 33.409 
7.015 8.231 9.390 25.989 28.869 31.526 34,805 
7.633 8.907 10.117 27.204 30.144 32,852 36.191 
8.260 9.591 10.851 28.412 31.410 34.170, 37.566 
8.897 10.283 11.591 29.615 32.671 35.479 38,932 
9.542 10.982 12.338 30.813 33,924 36.781 - 40.289 
10.196 11.689 13.091 32.007 35.172 38.076 41.638 
10.856 12.401 13.848 33.196 36.415 39,364 42.980 
11.524 13.120 14.611 34.382 37.652 40.646 44,314 
12.198 13.844 15.379 35.563 38.885 41.923 45.642 
12.879 14.573 16.151 36.741 40.113 43.194 46.963 
13.565 15.308 16.928 37.916 41.337 44.461 48.278 
14.256 16.047 17.708 39.088 42.557 45.722 49.588 
14.953 16.791 18.493 40.256 43.773 46.979 50,892 


IN A-LEV 


SRITICAL VALUES FOR CORRELATION COEFFICIENTS 


These tables concern tests of the hypothesis that a 


ini wi 
are the minimum values 
ae gnificant at the level shown, 


The values in 


correlation coefficient in order to be si 


population correlation coefficient p is 0. 


le 
i d to be reached by a samp. 
sre on a one-tailed test. 


Spearman’s Coefficient 
Product-moment Coefficient P tava 
Sample 0.01 
0.05 hone 0.04 0,005 | size 0.05 0.025 
0.10 : : a at 
4 | 1.0000 
0.8000 «0.9000 ~———:0.9500 0.9800 eae 5 0.9000 1.0000 1.0000 
: 83 0.9343 . 
0.6870 0.8054 0.87 ce oaesg. 209429 
172 6 | 0.8 
8114 0.8822 0.9 0.7857 0.8929 
oo en ; 7545 0.8329 0.8745 Ke Ht 0.7381 (0.8333 
Dae oe 0.7067 0.7887 0.8343 ; 0,6000 0.7000 —-0.7833 
0. . : 0.7977 : : 
4 (0.7498 : 6485 0.7455 
Hie tae nen 0.7155 0.7646 10 | 0.5636 on 094 
0.4428 0. : ; 4 0.6182 0.7 
48 11 0.536 
6021 0.6851 0.73 5874 0.6783 
oar re : oe o.sset 0.7079 | 2 He 05604 0.6484 
: 5 
29 0.6339 0.683 0.5385 0.6264 
0.3802 0.4762 «0.55 0.6614 | 14 | 0.4637 . 

5324 0.6120 : 5214 0.6036 
pike cone ere 0.5923 0.6411 | 15 | 0.4464 0 he 
0.3507 5 : 94 0.5029 . 

226 | 16 | 0.42 

4973 (0.5742 0.6 4877 0.5662 
ve aes oagni 0877 0.6085 | 17 hae oa7ié 050% 
ee 9,400 0.4683 0.5425 0.5897 | 18 0.3912 0.4596 ~—*0.5354. 
ee 0.3887 0.4555 0.5285 0.5751 | 19 3805 0.4466 «0.5218 

a ee 0.4438 0.5155 0.5614} 20 | 0. 0.5091 
0.2992 . ‘ 0.4364 5 
7 | 21 | 0.3704 
4329 —-:02.5034 0.548 4252 0.4975 
eee et eee 0.4921 0.5368 Fe Sea 4160 0.4862 
0. : : 0.5256 : . 7 
4133 0.4815 0.4070 0.475 
0.2774 0.3515 0. 0.5151 24 0.3443 F 
44 0.4716 . 3977 0.4662 
0.2711 tee ae 0.4622 0.5052 25 0.3369 0.39 — 
0.2653 0. 5 : 0.3901 0. 
26 0.3306 S 
2 0.4534 0.4958 3328 0.4487 
0.2598 oe Sones 0.4451 0.4869 | 27 mk 03755 0.4401 
0.2546 : : 0.4785 28 . . 5 
9 0.4372 F 3685 0.432 
0.2497 (0.3172 eed 0.4297 0.4705 | 29 eben ics 0.4251 
0.2454 She 0.3610 0.4226 «= (0.4629 | 30, | 0.306 eee 
0.2407 . : 0.3128 Z 
2638 -—«0.3120«SS«0.3665 «0.4026 | 40 eer 0.2791 0.3293 
saree 0.2353 0.2787 «0.3281 0.3610 | 50 en 402545 (0.3005 
0.1843 ; : 0.3301 60 : : 2782 
2542 0.2997 : 0.2354 0. 
he Pee ; 2352 (0.2776 0.3060 | 70 as 0.2201 —*0.2602 
neve oe 0.2199 0.2597 0.2864 | 80 | 0.185 , 0.2453 
0.1448 . ; 0.2074 : 
1745 —-0.2072,—«0.2449 0.2702 | 90 ee 03967 (0.2327 
os ee 0.1966 0.2324 0.2565 | 100 | 0. 
0.1292 . : 


65 23 
09 sé 
SS 99 
72 82 
04 21 
87 01 
31 62 
29 81 
39 98 
56 14 


29 56 
93 32 
95 69 
65 71 
90 27 


90 29 
99 74 
87 87 
46 24 
66 79 


36 42 
07 66 
93 10 
49 50 
20 75 


02 40 
59 87 
48 08 
54 26 
35 35 


73 84 
34 64 
68 56 
72 47 
44 44 


28 11 
87 22 
44 93 
81 84 
09 75 


77 65 
19 06 
52 91 
$2 47 
52. 67 


66 25 
29 97 
15 25 
82 08 
81 35 


68 00 
76 51 
98 60 
45 44 
28 72 


80 59 
46 53 
57 94 
74 22 
80 10 


62 74 
57 38 
S154 
32 43 
33 43 


42 45 
06 29 
66 91 
17 74 
81 43 


94 58 
25 08 
OS 72 
63 99 
58 89 


62 09 
21 38 
99 66 
86 75 
58 45 


90 49 
78 00 
87 47 
OS 42 
96 75 


57 47 
38 88 
14 59 
37 25 
35 21 


0S 04 
31 61 
87 07 
25 14 
87 40 


71 73 
56 42 
03 68 
65 67 
03 25 


77 82 
04 73 
01 33 
09 53 
73 25 


89 36 
84 40 
35 91 
77 19 
76 52 


12 67 
39 36 
43 19 
64 67 
97 84 


61 34 
20 55 
16 97 
97 37 
40 92 


83 30 
99 27 
18 26 
26 71 
39 04 


00 71 
29 78 
43 38 
44 15 
23 58 


01 21 
92, 59 
63 06 
88 07 
89 S7 


61 57 
91 99 
67 40 
90 43 
04 47 


22 18 
34 03 
19 62 
93 91 
63 41 


78 60 
56 90 
92 45 
64 13 
87 24 


58 14 
94 30 
06 93 
04 83 
02. 74 


41 59 
56 31 
90 70 
12 81 
38 54 


09 35 
87 42 
20 49 
22 5S 
20 57 


30 13 
72 70 
51 50 
39 03 
84 72 


92 39 
69 48 
36 67 
47 94 
42 73 


09 37 
72 67 
28 13 
20 39 
63 66 


90 29 
67 74 
24 71 
27 SS 
12 60 


89 88 
16 08 
24 10 
56 62 
54 08 


20 10 
61 55 
32 28 
75 $1 
91 86 


50 62 
16 75 
53 00 
51 14 
83 $9 


10 85 
16 74 
85 13 
03 83 
35 81 


60 27 
74 76 
94 24 
29 42 
84 13 


89 33 
72 SS 
57 25 
65 65 
49 91 


30 39 
11 43 
61 36 
54 83 
88 32 


18 40 
85 32 
68 48 
32 71 
37 93 


80 44 
42 83 
50 25 
20 03 
09 62 


57 06 
58 48 
41 98 
58 74 
42 38 


62 18 
17 76 
11 63 
94 58 
98 44 


81 87 
98 58 
04 91 
49 26 
10 47 


91 04 
74 95 
06 29 
38 28 
04 67 


11 85 
69 59 
23 17 
98 41 
78 49 


64 89 
52 23 
19 35 
04 50 
99 90 


04 28 
73 97 
90 $5 
48 86 
41 20 


21 $2 
9S 82 
96 47 
34 00 
83 24 


03 00 
16 46 
31 69 
72 91 
11 07 


50 37 
65 21 
47 93 
58 54 
80 92 


68 73 
92 09 
79 06 
82 08 
77 36 


93 67 
27 47 
40 47 
49 03 
08 16 


05 69 
83 50 
42 48 
49 41 
80 70 
95 97 
99 26 
46 43 
24 30 
31 52 


SF 41 
04 38 
25 S51 92 04 
67 41 01 38 
52 67 6140 


47 45 18 21 
72.95 96 06 
50 22 23 72 
62 34 36 81 
2255 41 04 


4475 01 
98 36 
26 20 
10 88 
17 64 


59 28 
75 37 
76 68 
74 61 
67 01 


12 90 
19 31 
68 58 
34 18 
28 77 


32 70 
34°79 
11 15 
80 29 
14 55 


51 10 
42 20 
07°18 
42 28 
45 69 


57 32 
$2 14 
07 56 
84 22 
44 86 


43 70 
01 48 
65 24 
20 83 
56 87 


64 16 
O01 63 
46 66 
39 62 
26 21 


45 25 
30 20 
52 31 
66 83 
60 50 


69 84 
36 83 
87 34 
43 07 
72 37 
87 45 
76 09 
43 73 
40 18 
66 87 


08 76 
77 43 
50 56 
43 63 
70 19 


91 65 
86 36 
45 86 
32 14 
60 47 


71 86 
47 86 
28 30 
06 97 
21 48 


63 08 
63 80 
16 49 
25 32 
32 70 


2117 
35 68 
62 74 
47 98 
20 52 


42 05 
08 67 
87 68 
43 22 
89 94 


32 80 
54 18 
85 05 
23 90 
87 28 


Bach digit in this table is an independent sample fro! 


to 9 has a probability of occurrence of 0.1. It should 
computer generated, and are therefore ‘pseudo’ rand 


om numbers, 


m a population where each of the digits 0 
be noted that these digits have been 


ANSWERS 


Chapter 1 
Exercise 1a Stemplots (page 8) 


NOTE: There are alternative formats 
1. (a) SO] 2 


Key: 85 | 1 means 86 


ARNO 


Key: 2|7 means 27 


Key: 5 | 3 means 5.3 em 


wre 
an 


BAAN wR oO 


RNwWRUAD VC. 


010268 


Key7 | 3 means 7.3 hours 


6. (a) 7.4 hours, 0.5 hrs 
(b} 0.074 g, 0.005 g 


7. (a) Before After 
8] 4 
7311015 
99664] 6|9 
953300 7(O0S5577 
1 8/001446 
3333100] 9/567 
55 |10)444689 
110/11; 7 
12] 5 
13|00177 
14135 


Key: 9| 7 means.79 


Key: 8|4 means 84 


Rate much faster afte 


r exercise 


(b) School A School B 
9875331/2}359 
9997774331113 \|46688 
88886655500 ]4/012234556779 
944331115 ]/002244666788999 
11610 
Key: 9| 5 means 59 [9 méans's9 


Older teaching staff in School B 


{c) Boys Girls 
2/455 
33322/2]222222 
10 }2/11 
9O9BRIT1L 188999 
6666/1 /6677 
S54 ]1 
1 
1yi 
910 


Key: 8] 1 means 0.418 s} 


Key: 1| 8 means 0.18's 


Boys have faster reaction time, 
Girls’ reaction times more consistent. 


Exercise 1b Histogram: 
polygons (page 21) 


is and frequency 


1. Boundary points 5, 10, 20, 25, 40, 45 


f.d. 0.4, 1.2, 1.4, 1, 0.4 


emda 
15 20 2! 


SEER 
5 30 35 40 45 
time (s) 


- (8) | Mase (e) Frequency. fd 
85-89 4 0:8 
90-94 6 Ae 
95-99 7 ae 
100-104 3 26 
105-109 10 2 
110-114 5 1 
115-119 5 i 


84.5 94,5 104.5 114.5 
modal class is 100-104 mass (g) 


() 8/6678 


3. Boundary points 0, 25, 60, 80, 150, 300 
fd. 2.48, 2, 4.4, 4, 0.2 


of 
0 860 100 150 200 


7 
250 300 
time (mins) 


4, Boundary points 40.5, 50.5, 55.5, 60.5, 70.5, 75.5 
fd. 2.1, 12.4, 11, 5, 2.4 


10 


f.d. 


: : : SESE 
40.5 50.5 60.5 70.5 
mass (kg) 


3 


Speed O- 20. 24 30-32 38-48 OO 
frequency 20 24 24 16 12 10 6 8 


6. Boundary points 176.5, 186.5, 191.5, 196.5, 201.5, 206.5, 


216.5 
f.d. 1.2, 1.6, 1.6, 1.8, 1.4, 0.6 


f.d. 
1 
Ei i fi 
of ia iuicnedaatizes 
176.5 186.5 196.5 206.5 216.5 


height (cm) 


4 
7. Plot polygon at (0.75, 2), (2.25, 43), (4.5, 73s (95 33), 
(13.5, 2}, (18, 1). 


8. 

Number of occurrences ofe Frequency Width = f.d. 
02 1 3 4 
3-5 5 3 es 

6-8 6 3 2 

9-11 3 3: 1 
12-14 5 3 ig 
15-17 4 3 14 

Plot boundaries at —0.5, 2.5, 5.5, 8.5, 11.5, 14.5, 17.5 

or at 0, 3, 6, 9, 12, 15, 18 


9, Plot polygon at (18, 17.5), (22.5, 94), (27.5, 107), 
.5, 56), (40, 11.8). ; , 
ener sf 25-30, skewed with a tail to the right. 
‘Other answers possible) 
10. sneha points -0.5, 9.5, 14.5, 19.5, 29.5, 39.5, 59.5 
. i“ 0, 10, 15, 20, 30, 40, 60 (say) 
f.d. 0.5, 1.6, 6.4, 4.1, 1.6, (0.1). 
41. Boundary points 9.5, 29.5, 39.5, 49.5, 59.5, 64.5, 69.5, 
na 10, 30, 40, 50, 60, 65, 70, 85 
or 9, 29, 39, 49, 59, 64, 69, 84 
fd. 1.4, 1.8, 2.2, 2.4, 2.8, 2.4, 1.6 
12. 6, 8, 8, 6,4, 10 
y i 200, 250, 300 
. Take boundary points 50, 100, 150, , is 
- Laicys Plot (75, 0.12}, (125, 0.28}, (175, 0.2), (225, 0.12), 
275, 0.08) 
ce Plot (75, 0.04), (125, 0.12), (175, 0.2), (225, 0.32), 
(275, 0.12) 


45 50 55 60 65 
height (cm) 
The maize seedlings showed a tendency to grow taller with 
the stronger solution. 
15. (a) a=20, b=26,c=12 
(b) 88 


Exercise 1c Pie charts (page 26) 


1. (a) 154°, 26°, 64°, 116° 
{c) 5.54.cm 
2. 208°, 46°, 38°, 36°, 32°: 5.25 cm 
3. 66°, 156°, 24°, 42°, 72°: 5.5 cm, 6 cm; 50° 
- (a) £120000 (b) 68000 (c) 90°, 27°, 9°, 30°; 7.5 em 
+ (a) 42 (b) 40° (c) 94; 420, 30.0 em 
- (a) 86°, 38°, 32°, 20°, 168°, 16° (b) 5.5 em 
« (a) £2000, £8000 {b) £400 (c) 27° (d) 80° 
» 28.8°, 72°, 115,22, 144°; 180 
- (a) £4500 (b) 1550, 1650 (c) 132°, 24°; 8m 


SONAR 


Exercise 1d The mean (page 34) 


4. (a) 9.7. (b) 154.8 {c) 51.375 (d) 17733 
(e) 0.908 (3s.£) (8 4 (g) 29.54 (h) 122.82 
2. 49.3 
3. 45 (2 sf) 
4. {a) Boundary points 0, 5, 10, 15, 20, 40 
f.d. 2.4, 7.6, 8.4, 4, 0.4 
(b} £11.92 
5. Boundary points 0, 15, 30, 50, 70, 100 
f.d. 3.6, 5.2, 6, 44,2. 
43.35 years 
6. 21.4cm 
7. (a) There should not be saps between the bars, Heights 
should be adjusted so that area o- frequency 
{b) Boundary points 4.5, 9.5, 12.5, 15.5, 18.5, 28.5 
fd. 2.8, 6, 5, 14, 0.8 


SE : fe 
4,5 14.5 24.5 

height (cm) 
(c) 12.9 m (3 s.£) 


8. (a) Boundary points 9.S, 19.5, 24.5, 29.5, 30.5, 34.5, 
39.5, 59.5 


f.d. 2, 4, 3, 14, 4,2, 0.5 
(b) 28 seconds. 


Exercise le Weighted means (page 36) 
1. 10.4 


2. Class teacher, 1.65% 
3. 40.6 


Exercise 1f Mean and standard deviation 
(page 44) 


1. (a) 5,2 (b) 8.5,1.80 (c} 18.8, 6.46 (d) 108, 4.10 
(e) 3.42, 1.91 (f) 205, 3.16 
2. (a) fad. 0.2, 0.32, 0.625, 1 04, 0.08 


{ 


E 


Fae i 
o fa pr 
200 300 400 500 
wage (£) 
(b) £338.25, £59.60 

3. 69.3, 1.7 

4. 115.8, 7.58 

5. 16.6 seconds, 2.63 seconds 

6. 6.8, 1.11 


7. (a) 2 min 38 sec, 1 min 54 sec 
(b) Histogram f.d. 6, 10, 15, 2.5, 0.8 
Frequency polygon: plot (0.5, 6), (1.5, 10), (2.5, 15), 
(4, 2.5), (7.5, 0.8) 
8. 29, 5.9 
9. 510 
10, 5 
11. (a) 10 (b) 11.7 
12. (a) 121, 6.19 (b} 14, 1703.8 (c) 1716, 3.59 
(d) 1026, 58 770 
13. {a} Frequency = 5 +18 +22 4284224 18+5=118 
{Area = £.d. x width) 
(b) Symmetric, Midpoints of intervals have been taken to 
represent the interval. 
() 3.5mm (2s.£) 
14. 28.15, 3.84 
15. 5.3 
16. 30.0 mph, 5.85 mph 


xercise 1g Mean and standard deviation 
age 50) 


L. 
zi 

3, 
4, 
5. 
6. 
7. 
8. 
9. 
0. 
1. 
2 
3 


4, 
LS. 
16. 


19 


2.3, LAL 
11.7%, 2.2% 


. (a) 4.6,2 (b) 4.56, 2.04 


‘0 
6011 Key: 454 means 49 


Features: modal class 30-34, skewed to the right, 61 
extreme value (outlier), 36.87, 35.59 

£195.45, £14.12 

11.87, 0.80 

4.44 


Exercise Lh Scaling sets of data (page 55) 


id 
2, 
3, 


WANA 


(a) 62.14 (b) 516, 2.14 (c) 78, 27.8 

50, 12 

(a) wtk,o (b) pu, po; 3u +5, 30 

(a) 2 (b) 200 {c) 2.02 (d) -4,-1, 2, 5,8, 11, 14 
(a) a=3,b=22 (b) 70 (c) 76 


. (a) 38, 8.99 (b) 34, 77 
. a= 1.6,b=10 


a= 0.8, b= —5; 6.25 


. (b) Take mark intervals 0 < mark < 10, 10 < mark < 20, 


etc. 
fd. 0.1, 0.8, 1.9, 2.8, 2.5, 1.7, 0.7, 0.3, 0.1. 
boundaries 0, 10, 20, 30, 40, 50, 60, 70, 80 
(c) midpoints 5, 15, 25, etc, 40.4, 15.45 
(d) a= 24, b=0.65 (2 s.f,) 


10. {a) 12.5  (b) 20; 80, 5. 


Exercise 1i Coding (page 58) 


Auhonres 


. (a) 313.76, 5.19 (b) 431, 132 (c) 0.0171, 0.00818 
~ 54,235, 0.927 

. 89.3275 

. 31.7 mins. 

© 71.2, 3.82 

, 465 secs. 


Exercise 1j Cumulative frequency, median and 
quartiles — ungrouped data (page 73) 


4. (a) 9 (b) 207 (c} 1896 (A) 0.55 
4 


a) 61 (b) 52. (c) 73 (d) 21 


23 

3. 

4. (a) 46,35 (b) 1.8, 1.2 (c) 20.5, 11.5 
5. | 


a) 7,2 (b) 14,3 


6. (a) ihumber of goals. 0° <1-<2°<3-<4.<5.56 
‘cumulative O21 4 56 11-19-25 
frequency: 
{b) 

oe 

2 

$ 

a 

2 

S 

3 

E 

3 

it) 1 Be SB” cif” CBM «6: 
number of goals 

(c) 5 
(d) 2 


7. (a) 2 {b) 3 (c) 2.47 (d) 1.94 
8. (a) 2,3  (b) 2 


{c)_ It only considers the middle 50% and does not take 
account of large families. 


Exercise 1k Cumulative frequency, median and 
quartiles — grouped data (page 81) 


Some answers ate approximate and depend on the curve drawn 


1. (@) Fifass (kg) cumulative frequency 
<39.5 0 
<44.5. 3 
$49.5 5 
< 54.5 12 
€59.6 30, 
< 64:5 48 
< 695 St 
<74,5. 52 
Plot 


(39.5, 0), (44.5, 3), (49.5, 5), (54.5, 12), (59.5, 30), 

(64.5, 48), (69.5, 51), (74.5, 52). Join with a smooth 

curye. 

{b) 21 (c) 1 
2. (a) f 


(d) 62kg {c) S84kg (f) 7.2kg 


at H im 


40 


ry 
6 


204 


cumulative frequency 


6.0 6.4 6.8 7.2 7.6 80 84 
pH value 


iv) T 
44 48 5.2 


{b) 82% 


{c) 6.5, median 


(d) 


Histogram to show pH value 
Ba 
s 
> 
5 3 
g 
2 
1 
0 é + . 
44 48 5.2 56 6.0 64 6872 76 8.0 8.4 
Median pH value 
mass (g) cumulative frequency 
<50. 3 
$54 5 
< 58 10 
<62 22 
< 66 32 
<70 38 
<74 40, 


Plot (50, 3), (54, 5), (58, 10), (62, 22), (66, 32), (70, 38), 


(74, 40) 


Median mass = 61.3 g 


4. (a) 


time (minutes) 


cumulative frequency: 


<5 
<10 
$15 
<20 
$25 
<30 
<35 
<40 


30 


(b) 24 (c) 26 (d) 23 (ce) 25 mi R 
5. (a) 687.5 hours (b) 133 hous. eee 
6. (a) 80.75g (b) 215 


7. (a) 


cumulative frequency 


Cumulative frequency curve to show maximum temperatures 


(b) 12°C (c} 80 (d) Approx. 10% 


20 25 30. 


temperature °C 


10. 


14, 


1S. 


8. 5 A 
time (mins) cumulative frequency’ 
< 39.5. 0. 
< 44,5. 8 
<49;5 30. 
<S4.5 64 
< 59.5: 94 
< 64.5. 120: 


For the curve, plot (39.5, 0), (44.5, 8) 

-5, 0), (44,5, 8), (49.5, 30), 
(54.5, 64), (59.5, 94}, (64.5, 120) and join points witha 
smooth curve. 


(a) 9mins (b) Approx. 11%}; 56 mins 


o: : 
distance’ (km) cumulative’ frequericy: 
0 0 
<4. 1 
<10 3 
<20 9 
<35 28: 
<60. 40, 
< 100 50. 
Cumulative frequency curve to show distances travelled 
50 + 
40 1 
a 
& 
g 304 
2 
£20 
= 
3 
104 
ie 


Oo 20 40 60 80 100 
Q distance (km) 
(a) 32km {b) Approx. 30 km (c) Approx. 54% 


“price: (£x) cumulative frequency 

$75: . 0 

< 95. : 6. 
<100. 16: 
< 105. 28 
€110. 41 
$120 48 
135: 33. 


Plot (75, 0), (95, 6), (100, 16), (10 
(120, 48), (135, 53) } (205, 28), (110, 41), 
{a) £104 (b) £13 (c) 47 


- x=25,y=17 
. Plot (405, 0), (415, 4), (425, 7), (435, 13), (445, 23), 


(455, 28), (465, 30). 
437, 412.5, 453. 


- Plot (80, 0), (85, 6), (90, 18), (95, 40), (100, 74), (105, 86) 


(110, 93), (115, 97}, (120, 99), (125, 100) 
@) ee (b) 10 mins (c) 62 
jot (165, 0), (170, 18), (175, 55), (180, 1 
(490, 228), (195, 250) eee 
(a) 180.5cm (b) 175.5cm (c) 187 c 
J : d) 189.5 4 
(a) $7 mins (b) 71.5 mins (c) 32%. peas 


660 4 CON 


16. Plot (69.5, 0), (74.5, 8), (79-5, 28), (84.5, 53}, (89.5, 84), 
(94.5, 94), (99.5, 100). 
9,3 secs, 22, 75.5 secs. 

17. 50p, £4.96, £5.96. Large amounts affect the mean but not 
the median 

48, Histogram: frequency densities 0.2, 0.5, 0.9, 0.8, 0.1 


thickness (im) 0 <20 <30 <40 <50 <60 


cumulative number 0.2. 7 16°. 24.25 
of strata : 


Plot (0, 0), (20, 2), (30, 7), (40, 16), (50, 24}, (60, 25) 
36 mm, 15 mm, 0.24. 


Exercise 11 Skewness (page 90) 


4. (a) 0.535 (b) -0.674 

2. -2.4 

302 

4, (a) Frequency densities: 0.8, 3, 5, 1.8, 1.2, 0.47, 0.2 


(b) Positively skewed 

5. Vertical line graph, 2, 3, 3.53, 1.985, 0.801, 0.771 

6. —0.482 

7. (a) B (b) A (ce) C 

8. (a) (i) 0.75 (ii) 0.28 
(b) Frequency densities: 0.2, 1, 1.2, 1.8, 2.8, 0.6, 1.2, 1, 

0.4, 0.2 

9. (a) 9.6 mins, 1 min (b) 0.33 
(c) (4.65 mins, 14,61 mins) 
(d) (4.3 mins, 15.27 mins) 

10. (a) 0.143 (Q, = 17, Q, = 26, Q; = 38) 
(b) 0.0668 (Q, = 11.9, Q, = 16.1, Q3 = 20.9) 
fe) 0.333 (Q1=9 Q,=11, Q:= 15). 


Exercise 1m Box plots (page 99) 


_ 1. (a) Plot (0, 0), (1, 8}, (2, 19), (3, 36), (9, 44), (10, 50) 
(b) 2.35 mins, 1.4 mins, 3.4 mins 
(c) Positively skewed 


(0) 10 
Length of call (mins) 
2. Group 1: Q, = 0.17; Q, = 0.21, O3= 0.23; times from 


0.14 to 0.26 
Group 2: Q, = 0.16, Q, = 0.19, Q3= 0.22; times from 
0.09 to 0.25 
Group 1 —y 
b— [ k- 
Group 2 
-K— tH 
0,08 0.26 
Reaction time {secs} 
3. Q,=22, Q, =35 Q,=51; whiskers from 16 to 97. 


Boundary for outliers 94.5; outlier 97 
1 
4 /—_—_—* 
1 
10 50 90 97 
Length of line (mm) 


4, (a) u.c.b. 0, 20, 30, 40, $0; cf 0, 20, 40, 65, 695 
Q, = 17.5, Qy = 27.5, Qa = 355 7-5, 105 negatively 


skewed 
{b) wccb. 0, 20, 40, 80, 100; cf. 0, 4, 10, 34, 445 


Q,=41.7, Q, = 60, O, = 78.3; 18.3, 18.3; negatively 


skewed, zero quartile skewness 


(c) u.c.b, 0, 5, 10, 15, 20, 25, 35; c.f. 0, 1, 6, 9, 14, 12, 
13; OQ, = 7.25, Qy = 10.8, Q5 = 16.8755 6.075, 3.555 
positively skewed 

(d) u.c.b. 0, 5, 10, 15, 20, 25, 305 ef. 0, 5, 20, 45, 90, 
140, 160; Q, = 14, Q, = 18.9, Qy=23; 4.1, 4.95 
negatively skewed 

5, ¥=63.9, s=29.5, outliers would be less than 4.9 mins, 

greater than 122.8 mins, outliers are 133, 144. 

6. Compare median, quartiles, range, skewness 
7. December: QO, = 0.3, Q, = 1.8, Q,=2.75 
july: O, = 4.1, Q)= 6.5, Q3=9.8 


December 

July 

iY) 5 10 

Hours of sunshine 
8 (a) OJ 12259 

1]00235799 
2125999 
3.02 
: 3 78 Key: 2| 5 means 9.25 a.m. 

(b) 9.19 a.m. 


{c) 9,10 a.m., 293 minutes past 9. 
(d) fF. -t 1 


tt 
9.00 9.10 9.20 930 940 9,50 
Time of delivery 


(b) Q1=4, Q)=6, Q3=7 


Nee a aE. 
12. 8 4.5 6 7-8" 
10. (a) 6,5 

(b) More than 3 standard deviations from the mean 

{c) (i) olde brother or sister also attended 
(ii) a mistake had been made 

(d) 5.5, 5 

(e) decrease 

{f) positive, less 


11. (a) height gain (grams) 
36(099 
37|6 
38 
39)1779 
40/237 
41)00 
42/057 
43) 04 
44| 5 
: 


(b) Draw plots — New corn: whiskers from 360 to 462, 
Q, = 397, Q2 = 450, Q; = 426; Standard corn: 
whiskers from 321 to 423; Q, = 353, Q2 = 3685, 
Q3= 383 


12. (a) Stem| Leaf 
1234467788 
0222346788 
023366778 
00224467888 
01255667 


ONIN 


(b) Q) = 66 miles, Q, = 52 miles, 3= 78 miles 


ee OR ny ee 


' ++ —+— 
40 50 60 70 80 90 
a Distance (miles) 
(d) (i) Gives a visual impression of the data whilst 
. keeping the details. 
{ii) Gives an immediate impression of an 
approximately symmetrical distribution with 
; ; the 
middle 50% lying between 52 and 78 miles. 


Miscellaneous exercise 1n (page 110) 


1, (a) =5.42, s= 0.33; range = 
7 33; range = 1,79, O, = 5.46, 
Q, = 5.295, Q,=5.615, outlier =4.07 


(b) (i) 5.465 . 
(ii) 5.47 outlier 
(iii) 0.22 Pa 


+, 
4.00 5.00 6.00 
specific gravity 
2. {a} paneer points for histogram 
-5, 709.5, 719.5, 729.5, 739. 
745; 759.5, 769.5, 789.5 pea: 
irst interval |.c.b. 689.5, u.c.b. 709, 
fd. 0.15, 0.7, 1.5, 3.8, 8.2, 7, 4.2, 32, 1,4, 0.5 
(b) Plot (689.5, 0), (709.5, 3), (719.5, 10), (729.5 25). 
(739.5, 63), (744.5, 104), (749.5, 139), (754.5, 160), 
{759.5, 176), (769.5, 190), (789.5, 200) . , 
(c) 744.24, 14.86 ae 
{d) 744.01, 736.08, 752.12 
(c) 0.046 
(f) 0.011 
(g) In box plot, draw whiskers from 689.5 to 789.5, with 
median and quartiles as in (d). ie 
. 16,6 (a) 5.86 (b) 15,7 
. 35 yrs 1 month, 11 yrs 3 months, 
is eae = 33 yrs 9 months, IQR = 17 yts 11 months 
5. {a) 44.5 
(b) 51.75 
{c} 


Bw 


Ne 
=) 
f=) 


150 


cumulative frequency 


8 


50 


i 
95 295 495 695 895 
Qi =40, Q; = 64 


334 Key: 4|2 means 42. 


( 
(e) yes 

6. (a) (i) 49.66 (ii) 433.97 Gili} 20.83 
b) c.f. 3, 9, 18, 28, 40, 58, 72, 83, 88 


ic} Plot (0, 0), (10, 3), (20. 9), (30, 18}, (40, 2: 
> O}, (10, 3), (20, 9}, (30, 18), (40, 28), (50, 4 
(60, 58), (70, 72), (80, 83), (90, 88) eet 
(d) (i) 52. (ii} 32 
(e) 11 
7. cf. 11, 39, 77, 111, 138, 150 
Take as boundaries 0.90, 1.15, 1.30 et 
20, 1,15, 1.. . or 0.91, 1.16, 
1.31, etc. or 0.905, 1.155, ete, Median w £1.30. 7 
eae ar etc. Median ~ £1.30. 


Q) = £5.20, range = £8.85 
(b) = £6, s= £2.47 


a K = £6.30, s = £2.59 
mean remains the same; low id 
benefit under scheme B. a ea 
9. (a) 8,6, 44,3 


lo+ 


0 10 20 40 60 80 100 
me length (mm) 
c) Approx, 2.5 mm (modal class is 0 < 
(d) (i) 39.9 mm on 
(ii) 35 mm 
10. (a) 275 
{c) Comparative bar chart 
1. ie {a} it becomes 39 
{b) x= 3x- 141 does not have an j i 
teger sofution. 
12. 100.7 mm, 0.4 mm; machine B ia 4 
variation with machine A Poe ey eee 


13. (a) 1, could be 1 or 2 
(b) Positive skew, possible outlier 
{c) 2, 1.7; more than 3 standard deviations from the 
mean 
(d) (A) a mistake. 
(B) could be correct. 
(e) 1.88, 1.48 


14. [Goa teioo0) B50. 60 S70 «<< 100 $150 
ck $40 1690 3010 3870 4320 


Plot (20 000, 0), (50 000, 540), (60 000, 1690), 
(70 000, 3010), (100 000, 3870), (150 000, 4320). 
Q, ~ £63 000, IQR: a value between £18 000 and £23 000 
is acceptable 
15. f.d. 0.93, 2.4, 1.4, 1.6, 0.9 


Histogram to show age distribution 


2 
3 
1 
oF tt 
20 30 40 50 60 
age (years) 


(a) 40.15 (b) 35} yrs. 


16. [oie (mins) “Frequency Frequency density 
Osxsd 20 20 
tex<2 a. WA ee WA 
2<x<2.5 SL. 102. 

ds<xe3 5 AB 
3ox<5. 138 69. 
S<x <0 85 8 17 


Histogram to show length of call 


time (mins) 


34 mins, divides area in half. 
17. (a) 8, 9.5 mins 
{b) Boundaries 0, 5, 10, 15, 20, 25, 30; 
fd. 8, 11.2, 5.6, 4, 2.4, 0.8 
(c) 10 
(d) A False, B True, C False, D True 


Mixed test 1A (page 114) 


1. t £ fid: 
65:<t< 85. 2S. 4:25 
85-<t< 95 28: 28 
95 <t<105 20: 2: 

103. <¢ <115: 47 1.7. 
FIs <t< ISS, 10. 0:25 


(a) Histogram to show times to complete half-marathon 
; : 
i 


2 
3 
1 
of + xt ae 
65 75 85 96 105 115 125 135 145 155 


time (mins) 
(b) 96.15 mins 


2. (a) 7,6, 4,8 
(b) 6.55, 5.7, 8.1 


a ee 


"T T ™ “* 


itis 7 
5.0 6.0 7.0 8.0 9.0 10.0 
Blood glucose level (mmol/’) 


(d) Positive skew. 
3. (a) 4.5 (b) 1.5 

(c) No change to mean, standard deviation is increased. 
4. (a) Pie chart, bar chart 

{b) Children in school, sample not representative. 

{c) fad. 3.6, 6.4, 4.4, 1.4, 0.7 

Histogram to show words per 

sentence in a magazine 


15 6101115 16-25 26-45 
words per sentence 
NB: boundary points could be 0.5, 5.5, 10.5, 15.5, 25.5, 
45.5 
(d) 13.8, 10.2 
(e) 9.41 
5. {a) Histogram 
(b) Individual values are not known and mid points have 
been taken as representatives of the intervals. 
(c) 69.5, 7.6 
(d) Median - no effect, IQR - no effect, 
mean — increased. 
6. (a) 7 15, 35, 20, 13, 10 
{b) 9, 5.43, 14.5 
(c)_ Male empioyees 


tH 


Female employees 


time (years) 


Mixed test 1B (page 116) 


1. (a) 1.15 (b} 4 (c) 1.09 

2. (a) {i) Easier to see the spread 
(i) 1)/1223444 

79 

2 


weNnuR 


APRWWNNE 
PRAROmU 
AWANURD 


Key: 1 [Sis 15cm 


(b) 24.6 cm 
(ce) 21cm 
d 4 ogi ae 
a “ Medan better; distribution not symmetrical. 
(b) 33° 
4. (a) Median same for both. 
B has 3 outliers; ignoring these, B’ iti 
time would be lower. : yea Pe 
B’s times are less variable than A’s. 
A’s times iti Fs i 
a are positively skewed, B’s are negatively 
(b) {i) If outliers are not the Post Office’s fault, choose B 
L for quicker service. 
(ii) if outliers are the Post Office’s fault then the 
penation ae happen again and there could be a 
long wait. A avoids long waits, 
S. (a) 00(6788 canis 


10}0012233344 
10}566667778899 
20/023 

2017 


a Q,= 15.5 mins, Q, = 12 mins, Q, = 18 mins. 
c 


5 10 15 20 25 30 
Times (mins) 
6. (a) (i) 3 brs 3 mins 
(ii) Q, =2 hrs 42 mins, Q, = 3 hrs 42 mi 
(b) 40, (200), 200, 60 ct pec 
{c) (i) 3 hrs 20 mins (ii) 54 mins 


Chapter 2 


Exercise 2a Equations of least s 
regression lines (page 136) Seay 
1. Data set 1 


(a) y=4.50+0.64x 
(b) x=4.42 +0.75y 


0 10 20 . 
Good positive correlation 


ANSWERS 663 


Data set 2 
(a) y=90.31-1.78x 
{b) x = 37.80 -0.39y 


) 5 10 15 * 


Good negative correlation 
2. (a) y 
6 
z 
S 
a 
4 


5 LEEPER EEE) PEPE 
© 110 120 130 140 150 160 170 * 
temperature 


(b) y= 0.614 + 0.0207% 
3. y=-2.59 + 0.6523 36.5 
4, F=-6,33 + 0.901, F= 20.8 
S. y=3,8 + 16x, x = ~2.06 +0.59y 
6. (a} y=15.834+0.72x (b) 66 (c) 59 
7. (a) 
100 


60 4 


totai cost (£1000) 


404 


° 
ote 


20 40 60 80 
units of output (1000's) 
(b) 20.7 + 0.96x 
{c) 31.000 - i i 
ene ee 000 units (d} Break-even point. 
841.2% 
10. c=15,d=-5 


annual salary (£) 
= 
ray 
ry 
g 
S& 


§000 


0 20 40 60 80 


(b) y=3710 + 192% 

(c) Appears reasonably satisfactory apart from B and c 
who have earned substantially more than the equation 
suggests. 


(d) (i) y=4210 + 192% 
(ii) y= 4010 + 207% 
(iii) y = 4160 + 200% 

(e) It would contain a term for employees who work 
away from home ¢.g. y= + bx +c, where ¢ » £3000 
for employees who work away from home and zero 
otherwise, 

12. 0.3, 0.6 
13. (a) y 


135 4): 


x 


19) 2.0 4.0 6.0 
% additive 


{b) y= 1274 1.17% 
(c) 


temperature 
(d) Argument invalid since relationship between yield and 
additive is not linear, yield declines above 4.5% 
additive; suggest additive 4.5%, temperature 90°. 


Exercise 2b Product-moment correlation 
coefficient (page 145) 


1. (a) 0.930, strong positive correlation 
(b) —0.828, strong negative correlation 
(c) 0.867, strong positive correlation 
(d) 0.742, positive correlation. 

2. 0.82 


* 0 ee agene appears to be linked to high wage 
inflation, so suggestion justified. 

4. 0.79 

5. 0.73, y= 254 + 0.53x, x = 94.4 + LOLy 

6. 0.60, W=-76 + 0.89 h 

BOTT 

8. -0.415 

9. (a) 0.954 (b) 2,3 

0. {a) 


heart mass (mg) 
a 
a 
ro) 


25 30 35 40 45 
body mass (g) 


(b) y= 48.35 +2.75x 
(c) 0.787 


Exercise 2c Spearman’s coefficient of rank 
correlation (page 151) 


1. 0.26 
2. (a) 0.43 ; 
(b) Some agreement between average attendance ranking 
and position in Jeague, high position in league 
correlating with high attendance. 
4, 0.033, little or no correlation. 
5. -0.62, some agreement between the scores. 


b) (2.275, 38.375) ; ; 

a Ranking both p and d from lowest to highest gives 
0.839. 3s ake 

(d) In general the population density is greater neaver 
centre of the town and less on the outskirts of the 
town. ‘ 

(ce) H, low population density and distance 
town. 

7. {a) 0.3, 0.5, 0.7 

a Mrs Brown and John; 1) Headrests 2) Heated rear 

window 3) Anti-rust treatment. 


from centre of 


8. —0.036, no agreement. 
9. 0.84, strong positive correlation between number of years 
smoking and extent of lung damage. 


10. a) y (b) y 
x x 
{c) -0.92 (d}) -0.9 
11. (a) y y 
x x 
0.60, 0.60 
12. (a) 0.7, good agreement between judges. 
(b) y 


13. (a} (i) -0.976 ii) -0.292 {or 0.292) 
(b) The transport manager’s order is more profitable for 
the seller, saleswomen is unlikely to try to dissuade. 
(c) (i) No, maximum value is 1 
(ii) Yes, higher performing cars generally do less 
mileage to the gallon. 
{iii) No, the higher the engine capacity, the dearer the 
car. 
(d)’ When only rankings are known; when relationship is 
non-linear. 
14. 0.84; very good agreement between the rankings indicating 
strong positive correlation between the marks in English 
and the marks in History; E. 


Miscellaneous exercise 2d (page 160) 


1. (a) y=3.07 41.17% 
{b) When the y variable is the controlled or independent 
variable. 


2. (a) 
(b) 
(c) 
(d) 


3. (a} 
(b) 
(c) 


(d) 
(e) 


t on w is required; t = 18.8 — 0.853 

(i) -13.6°F (ii) -28.1° F 

~0.946, points lie close to the regression line. 

Good estimate for w = 38, since strong correlation. 
Estimate for w = 55 needs to be treated with care since 
extrapolation (outside range of data) is unreliable. 
Strong negative correlation 

y = 6.85 - 0.0072x 

pH = 6.85 at t= 0°C; for an increase of 10°C, 

pH drops by 0.07 

6.71, reliable; 6.17, unreliable, outside range of data 
48.6°C 


665 


| 


ie) 40 45 60 55 
y=23.0 — 0.267% 
7500, There isn’t a wide degree of scatter, so estimate 
could be reliable, but in general it is unwise to 
extrapolate outside the range of data. | 
No. The points do not lie in a line. | | 


0 10 20 
0.935 
b) indicates strong positive linear correlation and 
diagram confirms this is appropriate. 
p=2.58 + 0.8873 15 
Ym, page 121 diagram 3 
y= 7.77 —0.005x 
5.77; treat with caution as outside range of data. 
The lower the percentage moisture content, the greater 
the heat output. 
~0.901, strong negative correlation, the greater the 
number of items finished, the lower the mean quality 
score. 


xh 
8.04 


7.0 


6.0 


4.04 


10 20 30 


Amend; possibly negative trend but not strong 
correlation, (32, 3.7) is an outlier 

Ignore outlier; weak negative correlation between 
number of items and quality score. 


8. y= 0.65 + 0.0157x; 
Rate of about 1 hour per mile distance; 3 days 19 hours; 
out of range of data, travel across water required; 0.942, 
strong positive correlation, points close to regression line. 
9. (a) y= 12.033 - 0.009x 
(b) 8.6 per 1000 
(c) Decreasing number of members of population per 
doctor not effective in reducing infant mortality rate. 
10, (a) Spearman 0.613; grades given 
(b) Product-moment, 0.95; numerical data given 
{c) Students performed at a similar standard in the 
written and listening tests, but not in the oral test. 
Standard in oral test related more to listening 
performance than written performance. 


0 20 40 60 80 
{b) p=-0.54 + 1.2 n; £17 
(c) 0.998; the points will be close to a line with positive 
gradient, 
12. (a) 0.96 


(b) points lie close to a straight line with positive gradient 


(strong positive correlation). 
(c) Equal to 0.986 since rankings will not change. 
13. (a) 0.0705 (b) y= 0.34 + 0.0085x 
(c) 0.477 (d) unreliable since outside range of data 
14, (a) ‘ 


Mean (4, 16.7). 
' (b) average decrease of 1.80°C per month 
{c) y= 23.9 - 1,80x 
{d) 23.9°C; regression line is valid only within range of 
data. 
415. 


output 


2.50 
2.00 
1.50 
1,00 
0.50 


of 
0 40 50 60 70 80 
body mass 


0.91, strong positive correlation. 


16. (a) 6.98 (b) y=~-7.42 + 1.115x, x= 6.82 + 0.862y 
{c} 8.20 tons per acre (d} 13.9 cm 
17. (a) y=41.79+1.55x (b) SL 
(c) 43, but creat with caution as outside range of data. 
18. (a} -0.9 (b) O {c) 0.9 (0.6 without outlier) 
19. (a) y 
100 
3 30 
8 80 
<= 70 
# 60 
8 50 
5 40 
S30 
3 
2 20 
a 
10 
ods i i 
0 1020304050607080 * 
number of items (1000s) 
(b) Diagram suggests a linear relationship 
(c})_ y= 61.1 = 0.966 (x ~ 42.4) 
{d) y=20,1 + 0.966x 
(c) Initial costs are approx. £20 000, cost increases by 
approx, £1 per item 
20. {a) 


temperature (°C) 


off 
0 5 1015 20 25 30 35 40 * 
time (mins) 


(b) y= 0.142 + 0.389% 
(c) 23.2°C, outside range 


Mixed test 2A (page 166) 


1, (a) y=3,667 + 0.038x (3 d.p.) 
{b) Mathematically a = 3.667 would indicate a yield of 
3.667 tonnes with no water at all; in practice this 
would be nonsense, 6 = 0.038 indicates yield increases 
by 0.038 tonnes for every extra centimeter of water. 
{c) 4.7 tonnes (only just outside range, probably reliable), 
9.3 tonnes (well outside range, unreliable). 
2. (a) y= 14.554 1.02 
{b) Initial temperature of milk. 
(c) 19.14°C, 34.95°C 
(d) First; second outside range of data. 
fe) oy 


19 


0 3 4 5 


{f) Temperature would stabilise at room temperature. 
{g) Points appear to lie on a curve, reaching a limit at 
room temperature. 


sisi 


3. 0.714 


4. (a) -0.975; points lie close to a straight line with negative 


gradient (strong negative correlation). 
{b) -1, complete disagreement in the rankings, 
(c) (iii), data follow a non-linear relation, 


Mixed test 2B (page 167) 


1. {a) oss lie close to a line with negative 
(b) yonx, y= 7.22 ~0.69x; 4.45 
(c) Depreciation of £700 per year. 
{d) (i) No, outside range of data. 


(ii) Yes, since x is controlled. Use x =2—4% 
— 


2, 
0 ig * 
0.515; 8.8 hours; regression line gives average value. 
points not that close to line as r= 0.515; Zin? minimised 
where mz; is vertical distance from point to line 
3. {a} y . 
400 
300 
200 
100 
1) 


9 20 40 60 80100120 * 
{b) y=-107+3.21x 
(c) 214, overestimate (data fit a curve); 375, 
underestimate (outside range, also non-linear 
relationship), unreliable 
{d)_ no, better to use a curve. 
4. (a) 0.714 
(b) Same, since there is no change in the rankings. 
{c)_ d? would decrease, therefore 1 sae would 


; n(n? —1) 
tncrease. 


Chapter 3 


Exercise 3a Elementary probability (page 173) 
z a 3 (b) 1 (c)} 
- (a) 0.375 (b) 0.62. 
coe i ae 5 (c} 0.75 (d) 0 (e) 0.8 
4. (a) 0.4625 (b) 800 
5. 0.73 


& GOs WA Gy ® ) 8 
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8 (a) (b) 3 

a+ 

10. 0.27 (2 dp.) 

ats fa) 3 Gi) 33 (b) $ 

13. (a) ae (b) ~ (ce) £ (ay $ 

14. (a) fi) & (ii) ob (iii) 0 *b) £36 or 12 


Exercise 3b (Probability) — i 
eeu ity) — combined events 


1. } 4 3 
5 fy 2 (b) 4 (c) g 
= {a) 6 bt OS Ws 
as 
S. (a) 0.5 (b} 0.4 (c} 0.2 (d) O14 
6. (a) (i) 5 (ii) $ (iti) Bb) 0.2 
7. (a) ¥ (b) 0 (ce) f 
8. 0.6 
9. 0.7 
oe Me we (Ok de 
12. (a) un {b) ui 5 
13. Yes 7 ae 
1s. 5 least one tail is obtained; both coins show tails, 
- {a 
Fruit tree Oth 
Birds nest 2 : ae et 
No nest 5 9 “14 
Total 7 13 | 20 
(b) 0.45 (c) 3 


Exercise 3c Combined events (page 192) 


1. (a) § (b) 0 

2. (a) 0.05 (b) 0.5 

3. (a) 0.15 (b}) 0.65; no 

4. (a) ip (b) ih 

55 

6. {a} 5 (b) d 

7.4 : 

8. (a) 0.5 (b) 0.35 (c) 0.375 {d) 0.4 

2) me Oe (Fh 

2 (a 

B G |: Totals 

Passed's:) 216 8 24. 
Taken, failed 7 6 13 
Learning 10 8 18 
‘Too young 2. 3 S. 
Totals 35. 25 60: 


) @ ( # dB @ (p Bs 
11. (a) Independent; obtaining a head when a coin i 
2 ig a head whi 
>} Mutually exclusive, 0. when a coin is tossed. 


12. 3 

13. 0.5 

14. (a) § (bh) & () B 

1S. (a) 0.2 (b) 0.03 {c) 0.32 
me B (b) i {o) § 

(a ds WR dS OB wi 
18. {a} 0.5 (b) (i) Sp (i) 4p yea ne 
49. (a) 0.1 (b} 0.3 (c) 0.45 
20. (a) 5 (b) } 

21. (a) f(b) 3 () § 
22. (a) 0.15 0b) & (ch 4 


Exercise 3d Tree diagrams (page 200) 


Section A 


» 0.78 
» (a) 


0.5 (b) 0.5 (c) 0.375 
8 

0.02 (b) 0.64 

a b) 8 oe 


0.34 (b) 0.063 (c) 0.19 (d) 0.97; 3 white 


(b) $ (ce) & (d) 4 (e) 4 


Section B 


1 
2, 
3. 


ie 


Be east 
SR Foyvena 


a 
= 


15. 
16. 
17. 


(a) 
{a) 
(a) 
(i) 
(b) 
{a) 
(a) 


(b) 


(a) 
(a) 
(a) 
(a) 
{a) 
(i) 


& (b) § 

3 (b) 8 (6) 8 

P(A occurs, given that B occurs) 
mutually exclusive (ii) independent 
0.88, 0.05 

0.33 (b) % 

a (b)e Ce) $ 


is 

& (b) (i) & (ii) ds 

as (b) YE (0) 3 

0.096 (b) 0.156; 

0.7, 0.68 (b) 0.28 (c} 0.65625 

ahs (b) 33) sd) 

Yes, no {ii) No, yes 

0.000877 {b) 0.421 (c) 0.65 (d) 0.642 
0.042875 (b) 0.142 (c) 0.1215 

0.189 (e) 0.334125; 0.642 

@) Gi) & Gli) A iv) 4 

{i) 0.0303 (ii) 0.450 (iti) 0.0348 

(i) 0.36 (ii) 0.848 
8 


Exercise 3e Useful methods (page 206) 


LEN AKAYN 


Exercise 3f Arrangements, permutations, 


(a) 


. (a) 


0.763 (b) 14 
5 (b) 6 


combinations (page 219) 
ea 
2. (a) 6t (b) $ 


1. 


= 
= 
8 
S 
ote 


- (a) 210 (6) & () 3b 


~ (a) i (b) 5 (e) 36 
. (a) 65268 (b) 4263 


. (a) 1260 {b) 2520 

. (a) 420 (b) B252,G 462 (c) 120 (d) #5 

. (a) 5040 (b) 1680 (c) 672 | 
. 5005, 720, 72 { 
. 5040 (a) 144 (b) 1440 

. (a) 2.5x 1077 (b) 3 193 344 i 
fa) $ (b) § i 
. 130 

. (a) 360 (b) 6 (c) 12 {d) 1170 

. (a) 64 (b) 18 () ¥ 

. (a) 9 (b) & (c) 1260 (d) § 

(a) 75 (c) BE Cd) Gi) GL ii) 72 

. 70, (a) 55 


(b) 30 i 
(c) 65 H 
(d) 3 
(e) 4 
() 4 


Miscellaneous exercise 3g (page 228) 


Aukone 


» (a) 


. {a) 0.36 (b) 0.48 (c) 0.01024 (d) 0.98976 
. f(a) C,C’ (b) C,D (c) CE 

. (a) 0.0902 (b) unsatisfactory test 

. 0.32, 0.467 

(a) 0.325 (b) fe) & 

. (a) 0.28 {b) (i) 0.157 (ii) 0.363 (iii) 0.163 


(c) 0.0728 (d) 0.404 


. 0,166, 0.580 

. $040 (a) 720 (b) 1440 

(a) ts b&b OB Gas (ha (8 6 

. (a)  (b) (cl te) ie 

. (a) (i) 0.005 (ii) 0.0955 (b) 0.999 (c) 0.136 
(a) (i) } Gi) 5 Gi 1b) & O 

, 5005, 1960, 319 (a) % (b) % 

. (a) 792 {b) 210 (ce) ik {d) 120 (e) 0.1 (f) 0.4 
. (a) 40320 (b) (i) 1440 {ii} 5760 


{) i) 4 Gi) § (d) S76 fe) 
(b) 4 (c) independent 
(d) 45 P(A[C) + P(A) {e) & 


J ms 


Mixed test 3A (page 231) 


fe ee 


B 
First Second Third 
draw draw draw 


(b) & () 3 (dy $ 
3. (a) 0.4 (b) 0.2 (c) B 
4. (a) 3 (b) HB (c) Bh (d) 30 


{e) The probability that a female employee is weekly 


paid. (f} 0.5 
5 @) i be Cs FZ 
Mixed test 3B (page 232) 
1. (a) 0.64 (b) 0.75 


2, -0.25 (b) 2 & 
(a) q {b) 5g =D () B 


3. (a) 0.857 (b) 0.135 (c) 0.13917 (d) 0.973 


7 {a} 0.1792 {b) 0.1686 {c) 0.203 


(a) ther 
Gz R 

seers. Ts 

07 ar 


a Ss. 
1997 1998 1999 
{b) 0.372 
() & 
(d) 8 
Chapter 4 


Exercise 4a Probability distributi 
(page 236) ility distributions 


1 (a) 01 pxan 


0.2 


012 3 4 5x 
(b) (i) 0.85 Gi) 0.55. Gil) OS fix) 3 


2: 
ie 12 13 14 
pnd 
PXex) [10k | gk ge |” 
3 04 
7 
es < w (b) 4 (c) 0 (da) B 
x: 0: ] - 2 
P(X = x) 4 4 . 
(b} = Ls 
Men lid | 
6. 
* 0. 1 2 a 
P(X=x) |b 24 a 
7. (a) 4 
F {c} 5 
. P(X =x) =0.1,%=0,1,2,... 
9 (a) > 925 say 


2a 8 ‘6 
ST - 
(Ae ee a 


a 
*. Teocs8 oO AO an ao 


443 Equally likely outcomes 


12, For x = 8, draw vertical line‘to 0.2; for x= 9, draw vertical 


line to 0.3; symmetrical distribution. 


Exercise 4b Expectation (page 244) 


1 
2. 


. 2.25 
ay 
. (a) 0.3 (b) 2.9 
1 
iz 
Tf 
0.75p 
*: 10 20. 
P(X&=x) | 04 0.6 
(a) 0.3 (b) 0.2 
. (a) 0.2 {b) 2.08 
. 2.75 


x 4 6 88 
PX =x) {016 032 016 0.16 0416 004 
Loss of £1.20. 


670 A CONCISE COURSE IN A STATISTICS 
12, £807 +x) {a) 5 (b) Loss of £3.75 
13. (a) 24 
©) ees aa 84 
Pees | bo $d 8 
(d) 1 
14, 3 
1s, 2 


Exercise 4c Expectation and variance 
(page 251) 

4. (a) 2.3 (b) 5.9 (c} 0.61 

2. (a) 0.35 (b) 4.2 

3, (a) 145 (b) 245 (c) 1245 

4, (a) 3.5 (b) 154 (c) 14.5 (d) 2p 

(a) 2.56 

a) 3.5 (b) 14 (c) SS (d) 84 (e) 1.75 
a) 2 (b) 3or-3 


oe eB toe 
eof be ke 


(c) 4 
10. (a) 4.2. (b) 7§  (c) 3.67 ; 
11. fa) a (b) 3} (c) 15% Cd) 285 fe) 4745 
12. (a) 14 (b) 35 (el § 

13. [TE a 


r§ mF OB Oe 
14, " $ qb) 25 () 10 (d) 10 
15, 144 
16. (a) § {b) 0.639 


P(X =x) = (1G), = 1,2, 
18. (a) & (b) 0 (c) 6 (d) 2.45 
19, (a) 0.04 (b) 5 (c) 4 (d}7 (e) 16 
20. (a) Loss £3 (b) (i) p=0.12,q=0.08 (ii) 645, 8 
24. (a) £2 (b) (i) 4 Gi) 17 Git) 1 


Exercise 4d Cumulative distribution function 
(page 255) 


fe ee 


PIY<y) (0.05 0.3 0.6 0.754 
2. (a) 0.41 (b) 0.87 {c) 0.46 (d) 0.13 (e) 2.58 
3. {a) T z 


S 


ba 


Fa) | ot 


b) Fe Te 2 3 48 
. 3 


Pa) | 


Ofey yo 1 23 


Fel | be eS 
4a 3 4 5 6 Te 
: — 30.9724 
P(x=x) | 0.01 0.22 O41 0.22 0.14 


2x-1 
5.) $ 0) 4 (6) Kaa OF 
6. (a) § (b) }_ (o) PK =x)=},x=1,2,3 (d) 0.816 
7.) Fer 2s 
Px=s [4 1 Ra 


fe} 2d 0.547 (d) 4 
8. (a) 0.9900 (b) 0.1746 (c) 0.5886 
(d) 0.5565 (e) 0.9785 


Exercise 4e Combinations of random variables 


(page 261) 


4. (a) 26 {b) 15 (c) 17 (d) 59 (e) 59 
2. (a) Qor12or—12 (b) 294 

3. (a) 1 (b) -1. (c) 34 (d) 14 (e) 14 (f 30 
4. (a) 1.3, 1, 1.01, 0.8 


(b) wey 0 4 2 
PiX+¥=x+y) | 012 0.14 032 


x+y al 3 4 5 
P(X+Y=*x4+y) 0.2 O18 0.04 


() Fee) 6 
042 0.44 0.32 


| Px ¥ex=y) 


xy. 1 2 3 
PIX-Y=x-y) | 0.2 0.18 0.04 
ps 


5. (a) 2.6,0.24 (b) 5.2, 0.48 (c) 7.8, 0.72 
6, 29% 
7, (a) 0.1 (b) 3 (c) 1 (d) 0.2 (e) 12 (3 


Miscellaneous exercise 4f (page 266) 


. 0.1825, £1.75 


. (a) (b) 28 
\. i 0.01 (b) 3.54, 0.4684 {c) 14.7, 11.71 


. YO, 2.57 
ay, 3.5, 1.25, 12, 20 


» (a) FE 2 eS 
PX=x) | bb 


AuPonder 


tole 
oe 


f 
Blo 
‘She 
Seto 
a 
Sl 


PY=y)| di A te de 


65 373 
7. ty Bs to 8 
8. (a) ¢ (b) d Qo 2 3 | @ 125 
See ee 
PO-d | 4b Ak 
9. (a) 0.1248 (b) 2.8352, 236 
10. (b) b | 0 i 


‘ite 
def 
loo'} 9 
aia | 
ee) 


pp=6) | 4 


() 18% (e) 

y 0 i 2 3 
PY=y) | 03 034 02 016 

(a) 1.22 (b) 1.0916 {c) 0.36 


12. 
BEE [228 Sa $e TERS OO Te 
PX=)|s 8 ee A HAR RR 

7.2, £75 


13. (a) ge (b) & (©) Ys-4,7 
14. {a) 4 {b) 2.78 (©) 0.260 
1S. {a} 0.8 (b) -0.24p (c) 3.34p2 
16. (a) 1.7, 1.18 (b) 4.76 

17. (a) 1,4 (b) 3,& (c} 11.2, 7.28 


5 
t 0 1. 2% 
mre) |b Be 


ae fw 
be [as 
eet | 


18. P(X=x)=1, x=1,2, 3,4, 5; P(X =6) = 0; P(X =x) <2, 

¥=7,8) 041; 46, 6 aia 

Mixed test 4A (page 269) 

1. (a) 0.2 (b) 8 (c) 11.6 

2. (b) x OSE ages 
POS a) Pe be 


(c) 15 (e) 0, 2 


3. {a) z 
ah fe Be book 


(b) 1G te) $ 


Mixed test 4B (page 269) 
1. fa) 0.4 (b) 0.8 (c) 2.6 (d) 1.44 (e) 15.6 


26) [orgy [Ames One Ronee 
PXeay [SP OP PS Pe 


(b) 64, 431%, 34} {c) Loss of £1 (d) & 


3. (a) = ; 5 5 ri Fay 
P(SSs)0 fod pope y 


{b) 4.5 (c) 14 
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Exercise 5a The uniform and geometric 
distributions (page 276) 


« fa) 0.2 (b) 8 (c) 0.4 

{a) 9.096 {b) 0.179 (c) 0.725 (d) 2.86 
(a) 0.9744 (b) 0.01024 (c} 1 (d) 13 (c) 2.5 
{a) 1 (b) 0.7599 

{a} 0.0226 (b) 0.00374 

(a) () 0.6 Gi) 0.3 (iti) 4.5 {iv) 2.87 
(b) (i) 0.0531 (ii) 1 Gil) 10 

7. {a} 1 (b) 2 (c) 1.41 

8. (a) 0.128 {b) X ~ Geo(0.2}  (c) 0.512 

9. (a) P(X =4)=0.73 x 0.3 = 0.1029 

{b) The first success is at the ath attempt. 


(c) There are at least # attempts before the first success is 
obtained. 
10. 0.7225 
11. 0.00026 
2 


° 
fa 
XN 
ron) 
es 
a 


eh 


eh 


AAP ONE 


13. (a) 0.0864 {b) 2.5 (©) 1.94 
14. £1.75 ee) (d) 1 (e} 0.028 


15. 0.0047, December 22nd 


16. (a) $ (b) HE (c) 


42s 
216 


{d) 1 


fe) 6 


(f) 17 
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Exercise 5b The binomial distribution 
(page 285) 


1. (a) 0.267 (b) 0.850 
2. (a) 0.234  (b) 0.000107 (0.0001 from tables) 
3. {a) 0.279 {b) 0.983 (c) 0.594 

4, (a) 0.00549 (b) 0.157  (c) 0.503 

5. 0.00200 

6. {a) 0.318 {b) 0.671 (c) 0.647 (d) 0.0324 
7 
8 


10. (a) (i) 0.0424 (ii) 0.623 (b) 12 
11. 0.0963, improve with practice 

12. (a) 0.0105 (b) 0.988 (c} 0.358 
13. (a) 0.329 (b) 0.461 


PIX = x) 


symmetrical 
15.9 


16. 68; not strictly binomial as p is not constant, but model 
can be used if there are a large number of bulbs in the box. 
17. (a) 0.000416 (tables give 0.0004) (b) 0.0197 

18. 5 z 

19. 


PIX Sx) 


0.0000 
0.0001: 
0.0011 
0.0109 
0.0617 
0.2096. 
0.3960. 
0.3206 


MED. HRW Of & 


20.4 

21. Experiment 1 — no, 3 outcomes; Experiment 2 — yes, 
constant probability of obtaining black (or white), 
independent trials; Experiment 3 — no, trials not 
independent. 


Exercise 5c Expectation, variance and mode of 
the binomial distribution (page 290) 

#25; 1.5 

2. (a) 1.38 {b) 4 

3. 8, 1.30 

4. (a) 0.2 (b) 0.00554 


—————— : | 


EL STATISTIC 
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8. 0.223 (b) 0.116 (c) 9.28,2.86 (d) 18.9 i 3. {a} 0.25 4 
5. (a) 0.25 (b) 2.5 (c) 0.282 G Part (c) gives 223, part (d) gives 227, increase | (b) A _ LL. (b} 24 
6. 0.1, 0.23 fa) Large number of balls (b} 0.799 | BOT eer 12. a=2,k=0.75; 
9. {a) Largs »k=0.75; 
7. (a) 10. (b) 0.000390 10. 0.790, calls occur randomly | 0.5 aa fx) 
8. (a) 3. (b) 3 {c) 0.633 11. {a) 0.104 (b) 0.283 (c) 0.00113 {d) 9 | 0.75 fx) = 0.75x(2 ~ x) 
9 (a) 0.994 (b) 2 12. 0.632, 0.069, 0.154 | 0 3x ; 
10. 0, 0, 3, 13, 30, 36, 18 43. fa) (i) X ~B (28, 0.004) (ii) 0.00545 (b) 0.785 i ee 
11. 2500 (c) independence (e) 3 19 37 ° : 2% 
12. 0.06; 293, 94, 12, 1, 0, 0 14, (a) 0341 (b) 0.9595 3.6, 12 4. (a) se (b) 855 38 30.2 
13, fa) 0.68 (b) 8, 1.6 15. (a) 0.253 (b) 3.6, 1.59 a est ase O2 
44, (a) 0.25. (b) 1.5 16. {a) (i) 0.201 (ii) 0.00637 (b) 2. {e) 5,2 {d) 14 \ & (a) 0.125. {(b) A) ‘ oe 
15. 1,0.894 (a) 5 (b) 0.2 17, (a) 0.203 (b) Gi) 0.136 (c) 0.316 rasa = 0.126x Exercise 6d Cumulative distribution function 
: ove nt ee (d) Assume p constant; very unlikely in First World War 0.5 (page 339) a 
Exercise 5d The Poisson distribution ue 3 &) 
(page 297) 18. P= x)=" sha Q 4x L(y Ra=le OS*S2 1 
4. (a) 0,180 (b) 0.0527 (c) 0.495 (d) 0.670 (a) 0.082 (b) 0.242; 6.15 (c) 0.328 ; ee 
2. (a) 0.983 {b) 0.184 (c) 0.199 19. (a) 0.908 {(b) 9 7. (a) 0.25 oe: Se 
3. (a) 0.0821 (b} 0.560 oi 20. fa) 3,7 {b) 20, ase anne ; {b) fw) (b) 1.59 : 
A, (a) 0.603 (b) 0.616 (c) 9. Reason for (a) E(Y - X) # Var(Y ~ , 0.75 fal ¢ eee 
5. (a) 0.0821 (b) 0.242 (c) 0.759 Reason for (b) 2Y + 10 could not take values less than 10. 2 (a) RayalgOr-* 7 1<x83 3 
(d) 0.0486 (e) 0.125 21. 600 m, Po(2.5), 0.0821, 0.109, 0.779, 0.207 0.5 1 23 
6. {a) 0.191 (b) 0.0498 (c) 2.45 22. {a) (ii) 1.5  (b) 0.577 (c) 0.0249 
7 0371 23. 0,407, 0.366, 0.165, 0.0629, 0.816, 0.0518 0.25 ; iia qeges 
8. (a) 0.0382 (b) 0.122 24, (a) 22 (b) 19; 39 3. (a) 5 (b) F)=45 {c) 2 (d) 2.5 
9, 0.677 25. (a) 0.135 (b) 0.323; 0.81 0 3x 1 x26 
10. (a) 3 (b) 0.145 26. (a) 0.387 {b) 0.929 (c) 0.893 ” 
11. (a) 90, 72,29, 81,0 (b) bear (d) 0.208 (e) CE ns (c} 0.25 (d) 0.3125  (e) 0.3475 4 Se Se 
12. Random events; 0.5, 0.481; 31, 16, 4, 1, 27. (a) 0.0902 (b) 0.0613; 7 foe) ax 
13, (a) 0.261 (b) 6 my a Exercise 6b Expectation E(X) (page 323) 4 (a) Fa)= J Gta +4) aex<3 (>) 
: Per? Mixed test 5A (page 1. (a) & (b) 1 (c) 2 {d) 1.6 (e) 24 
Exercise 5e The Poisson approximation to the Si cnee MP daRR , plel . 4 : aes 
binomial (page 300) : Query independence: friends may have joint engagements. 5. (a) 0.1215 (b) 0.841 (c} is 
y i) 0.0476 (ii) 0.0498 (b) (i) 0.225 (ii) 0.224 2. (a) 0.152 (b) 0.567 (c) 0.285 =x O<x<3 
L ts 0 one pee * 4 X ~ (ASO, sh), 1= 1.875, p< 0-1, > 50 ; k 6. (a) 1.5 (b) 0.75 (c) Fx) =43 
2. (a) (i) 0.184 (ii) 0.0190 (b} 0.271 {c) 0.0498 {b) 0.559 (c) 369 ‘ ; 1 x33 
3. (a) 0.287 (b) 0.191 4. (a) X ~ Po(0.6), X is number of boxes in a square km. oe Bee at a (d) 0.4 (e) 0.2 
4, (a) & (b) 0.713 (b) 0.549 {c) 0.0234 b) t (2 
5. (a) 0.647 (b) 0.185 (d) Probably not suitable; different scatter of telephone P My 3 (ce) F(x) 
6. 0.109, 185 boxes in the city. ag 4 6 a i 
a oe iby ae ae PN OE eEe Ne 5. (a) & {b) ® (c) 0.48, money bond 
» (a) 0. ; . 6. 2, 0.124 : 
9. 0.0150 Mixed test 5B (page 313) i 7. 2'5, 0.803, 0.456 ar 
10. (a) 0.47 (b) 0.044 2 (a) S—p)p' (b) t0(1 -p)*p? 8. (a) 2.875 ke (b) £4.75, 2 3 
Poi applies since p < 0.1 and # = $0. Events may not 1.3 (a Pip . . $4 +75; 56 7. (a) 3 F 
be independent. After mis-dialling, you are likely to be 2. {a) Y~ GeotZ) (b) 30° (c) 0.233 B 9. (a) 04 {b) 2.6 (c) 15 4 ay 09) 
A . . pu 
more careful. 3. (a) Binomial (b) Poisson (c) ¢” & Exercise 6c Standard deviation and variance ’ 
11. Random sample, 0.305 YS (page 333) 
: i riables (d) 1-2? 1147} 001s, 0.014, 0.182 
Exercise 5f Sums of Poisson va 1. (a) 1.5 (b) 2.4 (©) 0.15 (d) 0.387 5 
(page 303) 4, (a) 0.221 (b) 0.987 2. fa) 0.5 (b) 24 (c) 24 (d) 1.44 Or: UbDy 3F 
pete 5. (a) 0.249 {b) 0.929 (c) 0.508; 0.542 3. fa) 1g (b) 35 (co) H (d) 0.553 ft 631) 1<xed 
2 fa) 0.189 (b) 0.308. (6) 0.184 4. (a) 1 (b) 1% (c) BE (a) 0.545 (b) 0.272 (c) Fx) ={7 (d) 1.65 
3. (a) 0.323 (b) 0.119 Chapter 6 5. fa) } (b) 3 (ce) & (d) 0.163 1 x>2 
4 (a) 0.301 (b) 0.080 (c) 0.251 Pp & ta 1H) th (6) (a) 0.912 4061, 
¢ ‘ robabilities ~ (a) ig (b) 46s. (c) fq (d) 0.672 319 _|gx-ax* OK<x<2 
Miscellaneous exercise 5g (page 307) Erie Calculating p' 8.) & 13,2 (Vz 8 pgp Fe-j4 16 310,007 
(page 3 9% fa} 1 (bhi (Et (a) B fey ui baa) 
4. 0.752, 0.537 2 (2 B 10. {a) 9. (a) 4,4 
2. (a) 3. {b) 0.223 {c) 0.988 i yy 3 (b) 3 2 Fx) - (a) 3545 i 5B 
3. (a) 0.733 {b) 0.0703 ; » fo) all iensia eta Ye 
4. (a) (i) 0.434 (ii) 0.378 (iii) 0.148 (iv) 0.0401 t ee ‘ iS tZ 2<x63 
(b) (i) 45. (i) 114;N>20 kl 3 x 5 aud 
5. 0,507 ijpiene exe 
6. (a) (i) 0.130 (ii) 0.271 (iii) 0.276; 65, 0.0159 ; ry (b) F)=43 ; 
(b) 90, 3 3 0 3 * 3 ws 5S cx<t 
7. (a) 0.270 {b) 0.350 (c} 0.182 ince ak (b) 3% (©) 12% (a) 1.008 ers Z 
(d) 0.124 (e} £45 {b) 0. F 
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10. (a) FO) 


wl wl 


1 4,1 
etext 1éx<3 
(b) 24 (c) Hay= 46 12" 4 (d) 2.16 
f x23 
41. (a) 5 (b) & (ce) ahs 543 tonnes 
Ax) 
= 
16 
0 Le 
Le O<x<t 
4 
2 “4 1565, 0.821 
12. FR)=V1 yey FE 
5 20 
1 xe2 


1 1 
13. (b} 1,2 (c} 0 (d) ARG 


14, (a) i 
8k: 
ag * 


0.0125x7 O<x<8 
{c) PQx)=40.2x-0,8 8<x<9 
x?9 
(d) 0.55 
15. (a) 0.75 (b) 0.2 
0.75x?=0.25x3 O<x<2 


(c) Reel; x2 
16. (a) 0.455, 3 (b) 3.64, 4.95 
in by i<x<9 


(d) 0.288 


(c) F(x) ={In9 
1 xed 
F(x) 
1 
OL Bos 


Exercise 6e Obtaining f(x) from F(x) 
(page 343) 
4. fa) f@)=4,26x<6 
(0) Ato 


{ce} 4 (dj 2 
2, (a) 0.794 (b) 0.75 


3. (a) 0.25 {b) fix) =1-0.5x,0<x<2 


{c) 0.586 (d) § 


Fix) 
2 O<x<1 
2 2 
4. (a) 4 (b) f@)= ; tex<2 2? 
0 otherwise 0 i 2 x 
(c) $ {d) 0.553 
@-0 1<x<3 
5) MO=F 1 gy 3 exe7 
12 
0 otherwise 
Ax) 
L 
3 
01 3 8 
(b) 33 (c) 1§ (d) 3.45 (e) 0.595 
6. (a) 1) 


7. (a) 1,-a) (b) Fix) = 427 
1 x23 


c) f(x)=4x?, 0<x<3 
8. i Hey fayo2.0<x<0.s (c) 0.25 (d) 0.144 
Exercise 6f Uniform distribution (page 349) 
4. (a) 4 (b} 4.5 (2) 0.75 Cd} § 


2. {a) 0.5 {b) -3.5 (c) 0.866 
3. (a) 5 (b) 0.325 (c) 3. fd) 15 
4. 0.4 
5. 0,577 
6. (a) 4.5 (b) 25 
7. (a) a=3,b=11 (b) 0.125 

o Ge-3) 3<x<t 

(c) Fx) =48 
1 x>il 


8. (a) flx)=0.2,-2<x<3 (b) 1.44 () 25 (d) -t 


Miscellaneous exercise 6g (page 355) 
4. (a) 44 (b) 63 
2. (a) 2.4 (b) 20, §, 0.178 
2 O<x<1 
3 
3.) 3 OAV, 1 exe 
3 


0 otherwise 


5. (b) fd 


12. 


13. 


14, 
1s. 


0.8, 


*Y 


-1 QO 1 
1 
Te Pere 
Resign ts. ig am 
(x)= ptge age) ~i<est (d) 0, # 
1- >1 
1s i 


(a) -0.1875 (b) 0.2375 (d) 2 

(b) 2.2 (c) 1.71 (d) 0.264 (e) 0.3645 
(b) 0.5 (d) 0.36 

(b) 30 hrs (d)  (e) 0.0390 


(f) The model does not allow for lifetimes over 90 hours. 


{a) 3.8 hrs, 0.36 hrs? (b) 4hrs (c) Approx 60% 


+ (a) 0 (b) 0.15625 (c) (i) symmetry {ii) 0.05 


(d) The player might make a similar mistake each time, 
resulting in more hits above the line than below, or 
vice versa. 

{e) The range would reduce. 

(b) 75 hours (c) § 

{d) The model does not allow for P(X > 2.5) > 0, since 
P(X >2)=0 

{e) Change to exponential 
model for x > 1.8, say 


(a) Fox) 


oO 1 2 3 4 Xx 


1 
=x? O<x<1 
4 
Mxy=42 1 
hoe a ee Pee 
Boddy S38: 
1 x24 


(d) £283.33 (e) 
8, 3, 39 litres 


{a) 2.93” (b) e-{ 


1-0.01(«-10)? O0<x<10 
1 x 210 


F(x) 


0.25 0 10 
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ay O<x<1 
17 
16. b) FO=) bg ant) gegen (155 (a) 0.89 
17 
“: x22 
17, A=} ie) 
a] 
aq 
4 to 
Sf — — 
ie) ef 2 3 4 * 
24, 72 


12> 96 
a -1<x<0 
18. (a) } (b) f@=2a OK<x< 
0 xeix<-I 
() 4 (d) 0.553 (e) HE 
19. (a) 2.1,1.29 (b) 1, 0.5 


Mixed test 6A (page 358) 


1. (a) f.d. 0.85, 0.76, 1.15, 0.8, 1, 0.9, 0.75, 0.36 
Histogram to show incomes 
{I 


oO 200 400 600 800 ¥ 
Income (£) 
(b) ye (e) 120 
(d) From original data, 106 have income in this range. In 
the model, f(x) = 3k, 0 <x <4 gives too high an esti- 
mate; perhaps f(x) = 2.5k, 0 <x <4 would be better. 


2. 4, &, tk, 0.541 
4 


w 

POS-4w) O<wes 
3. (a) wa] sr O- 4) Osu 

1 wes 


{b) 0.650 (c) 0.794 {d) 3.75 (Ef) Negatively skewed 
Mixed test 6B (page 359) 


1. (a) fx) 
2 
o i 2x 
{b) 1.6 (c) 0.327 (d) Fim} =0.5, Flu) <0.5 m> Mt 
2. (b) & {c) 0.577 
1 
) 
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Chapter 7 


Exercise 7a Finding probabilities, where 
Z ~ N(0, 1) (page 367) 


1. (a) 0.8089 (b} 0.8089 (¢; 
2. {a} 0.0359 (b) 0.2578 (c 
{e) 0.0049 {f) 0.9911 (g) 0.9686 ( 
(i) 0.0312 (j) 0.9484 {k) 0.9803 ( 
3. (a) 0,05 (b} 0.05 (c) 0.0999 (d) 0. 
(f) 0,01 {g) 0.0025 (h) 0.075 


0.1911 (d) 0.1914 
0.9931 (d) 0.9131 
h) 0.2343 
1) 0,0021 
025 {e) 0,005 


(a) 0,044 (b) 0.8185 (c) 0.1336 (d) 0.3023 
5. (a) 0.1703 (b) 0.5481 (c) 0.3639 (d) 0.4582. 
(e) 0.4798 (f) 0.9624 (g) 0.0337 (h) 0.9082 
(i) 0.2729 (j) 0.030 (k) 0.925 ({) 0.4508 
(m) 0.9 (n) 0.02 


6. 
7. (a) 0.9 (b) 0.7 
8. (a) 0.55 (b) 0.45 
9, (a) 0.9 (b) 0.1 


Exercise 7b Finding probabilities using 

X ~ N(, 07) (page 370) 

1. (a) 0.0668 (b) 0.4013 (c) 0.1747 

2, (a) 0.7054 (b) 0.0618 (c} 0.4621 (d) 0.00456 
3. {a) 0.0548 (b) 0.1448 (c) 0.9544 

4. (a) 0.0106 {b) 0.9857 

5. fa) 0.3015 (b) 0.5231 (c) 0.3792 
6. 740 

7. 0.00003844 

8. (a) 0.6554 (b) 8 

9. {a) 0.0478 (b} 0.000817 

10. (a) 0.9544 (b) 0.5784 (c) 0.0435 
11. (a) 0.1056 (b) 0.7734 {c) 0.6678 
12. 0,159, 0.775, 0.067, £37.56 

13. 0.785, 0.397 

14, 0.957 


Exercise 7c Using the standard normal tables 
in reverse (page 376) 
. (a) 0.018 (b) 0.796 (c) ~1.887 (d) -0.454 

(e) 0.562 (f) 1.019 (g) 0.842 
2. (a) 1.94 (b) -0.695 (c) -0.915 (d) 0.722 
3. (a) 0.91 (b) 1.66 {c) 0.674 (d) 2.05 
4. 0.674, -0.6745 0.524 
5 
6 


= 


» (a) 70 (b) 4.65. (c) 190.742 {d) 1.468 
. (458.92, 546.52) 
(a) 0.6247 (b) 629.528 (c) 3 
8, 1,158, (6.10, 9.90) 
(a) (384.32, 415.68) (b) (394.608, 405.392) 
. (a) 0.9332 (b} 0.383; 106.6, 137 
» (a) 0.0548 (b) 26 {c) 67.4 (d) 2183 
. (a) 37.8% (b) (125.5, 194.5) (c} 0.405 


Exercise 7d Finding u or o or both, where 
X ~ Niu, 07) (page 381) 


30 

10.7 

8.31, 35.9% 

35.5 

1.75 

. 52.73, 11.96 

. 2.74, 2.78 

. (a) 6.99, 0.324 (b) 0.0105 


SNANAwWHE 


9. 39.5, 5.32 
10. 53.87, 16.48 

Jt. 0.203 

12. 92.7%, 1.32, 1.7% 


16. (a) 0.4875 (b) 281, 5.00 

17. 5.2007, 0.00346; 0.0269 

48. {a} 0.1587 (b) 128.4 (c) 1.31 

19, 0.0401 {a} 0,459 (b) 0.003 

20. 490 g, 12.22 

21. (a) 19.50 (b) not symmetrical (c) 32 


Exercise 7e Continuity corrections (page 386) 


1. P(2.5 <X < 9.5) 
2. P3.5<X<8.5) 
3. P(10.5 < X < 24.5) 
4. P(LS<X<7.5) 
5. P(X > $4.5) 
6. P(X > 75.5) 
7. (45.5 < X < 66.5) 

8. P(X < 108.5) 

9, P(X <45.5) 
10. P(55.5 <X < 56.5) 
11. P(400.5 < X < $60.5) 
12. P(66.5 < X < 67.5) 
13. P(X > 59.5) 
14. P(99.5 < X < 100.5} 
1S. P(33.5 <X < 42.5) 
16. P(6.5<X<7.5) 
X > 508.5) 


Exercise 7f The normal approximation to the 
binomial (page 389) 


0.1958 
. (a) mp>5,ng>5 {b) 0.0197 {c} 0.0968 
. (a) 0.0154 {b) 0.8145 {c} 0.02 
. (a) 0.657 (b) 0.2142 
. (a) 0.0318  (b) 0.8345 
. (a) 0.9474 (b) 0.6325 (c) 0.5914 (d} 0.0111 
. (a) 0.4502 (b) 0.0996 {c) 0.484 
- 20, 16, 0.00436 
. PER = 1) = "C,(1 ~ p)""p%, mp, 01 ~ P) 
(a) 0.2304 (b) 0.9222; 0.8531 
10. 0.1432 
11. 0.6886 
12. np>5,nq>S (a) 0.1853 (b) 0.1838 (c) 0.81% 


WON ANSON 


Exercise 7g The normal approximation to the 
Poisson distribution (page 390) 


1. (a) 0.6201 (b) 0.39 (c) 0.5406 

2, (a) 0.3998 {b) 0.2004 {c) 0.3661 {d) 0.0637 
3. {a) 0.313 {b) 0.5078 {c) 0.8335 {d) 0.1104 
4, (a) 0.2614 {b) 0.2343 {c} 0.0558 

5. 0.8901 

6. 0.6887, 4 

7. (a) 0.4574 (b) 0.173 {c) 0.8312 

8. (a) 0.4594 {b) 0.5363 

9. (a) (@) 0.9815 (ii) 0.3486 (iii) 0.9244 (b) 0.0094 
0. (a) 0.199 (b) 0.185; 0.870 

1. fa) 0.927. {b) 0.0102; 0.297 


12. (a) Weevils are randomly scattered in the grain, the grain 
is selected at random. 
{b) i) 0.950 (ii) 0.105. {c) 0.158 
13. (a) 0.953 (b) 0.745 {c} 0.19 
14. (b) 0.133 (e} 11 (d) 0.7119 


Miscellaneous exercise 7h (page 398) 


1. (a) 46.5% (b) 0.532m (c) 100M 
2, (b) 0.0693 (c) 0.0746 
3. (b) 11.5% 
4, 50,154, 4 
5. {a} i) 0.0062 (ii) 0.5598 (b) 7.49m (c) 0.27 
{d) Brian, since P(X > 8) = 0.0207 whereas for Alan 
P(X > 8) = 0.0062. 
6. (a) 0.886 
{b) Data not symmetric but showing a positive skew. 
7. (a) 1.2 (b) 53.6 (c) 54.2; 0.066 
8. (a) (i) 4.95% (ii) 0,1, 
(b) (i} 105.3 (ii) 106.45; 106.45 
(c) (i) 103.3, 3.98 
(ii) needs overhaul, standard deviation too high. 
9. (a) 14.25 p (b} 736g {c) 462g 
10. (a) (i) 0.250 (ii) 0.758 (iii) 0.00240 (b) 0.0433 
11. (a) (i) 0.197 (ii) 0.820 {b) (ii) 19 {c) 0.2142 
12. 0.360, 0.734 
13, (a) 0.653 (b) 0.2224 
14, (a) (i) 104 (ii) 33. {iii} 33. (b) 1000, 200 
15. (a) 0.3154 (b) 0.3068; worse, 0.5245 
16. 979.27, 17.27, 133 
17. (a) random events, mean = variance 
{b) 0.224 (c) 0.586 (e) 0.6201 
18. (a) 0.988 (b) 0.855 
(c) 0.783 (Poisson), 0.784 (binomial) 
19. (a) 0.649 (b) 0.965 (c) 0.371 
20. (a) 0.988 (b) 0.624 (c) 0,828 
21. {a} mp >5,nq>5, X~N(np, npq) 
{b} p<0.1, n> 50, X ~ Po(up); 0.859 
(c) 0.204 (d) 0.034 


Mixed test 7A (page 401) 


1. (a) 29% (b) 402.62 ng/ml 
2. (a) 25. (b) 0.673 

3. (b) 0.0113 (c) 0.86 

4. (a) 0.0548 (c) 0.356 


Mixed test 7B (page 402) 


1. Luxibrite, 0.936 
2. {a) 0.1056 (b) 0.8641 (c) 815,68 
3. (a) (i) 0.8944 (ii) 0.4934 
(b) only able to stay for a maximum of 60 minutes 
(c) mean + 30 gives 6.55 pm 
4, {a} 7.5 (b) randomly scattered {d) 0.901 {e) 0.2627 
5. {a} {i) 0.0808 (ii) 0.1935 
{b) 0.295 {c} 0.0598 


Chapter 8 


Exercise 8a Sums and differences of normal 
variables (page 409) 


1. (a} 210,625 (b) X ~ N(210, 625) 
{c) 0.6554 {d) 0.7698 

2. (a) 0.1319 (b) 0.0127 

3. (b) 0.9324 

4. 0.0745 
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5. {a} 0.5 (b) 0.8849 (e) 0.2779 

6. (a) 0.0207 (b) (i) 0.0289 Gi ii 
ean (9 0. 9 (ii) 0.0200 (iii) 0.6252 
8. (a) 0.6298 (b) 0.1056 

9. {a) 0.1728 (b) 0.6127 

10, 0.2575 ae 


11, 0.1103, 0.753 

12. 9.6, 0.522; (a) 1.8% (b) 22.2% 

13. a) (94.4, 105.6) (b) 92.55% (c) 22.14% 
14. (a) 0.0787 (b) 3.02 x 10-6 


Exercise 8b Multiples of normal vari 
(page 413) variables 
. (a) 0.8962 (b) 0.9386 

. (a) 0.2398 (b) 0.2523 

- (a) 0.244 (b) 0.659 (c) 0.409 


(a) 6, V2 (b) 0.2074 (c) 0.760 
0.2762 (<) 1 (d) 0.5143 


- (a) 0.3446 (b) 0.6915; 0.0033, 0.304 


AwRwne 


Miscellaneous exercise 8c (page 417) 


. (a) 0.60 (b) 0.20 (c) 0.95. {d) 0.5 

. (a) 0.054 (b} 0.00155 (c) 0.9782 

. 1000, 172, 3000, 298, 0.16, 0.02 

. (a) 0.0888 {b) 0.6611 

0.0625, 0.2574, 0.5, 0.7123 

. (a) 0.0139 {b) 0.1587 {c}) 0.9332 

. (a) 0.159 (c) 0.584 

, " kg, 57.0 g, 3.97%, 765 g 

. (a) (i) 0.1056 (ii) 0.8882 (b) 1028 (c) 0. 

10, (a) {i) 0.1056 (ii) 0.144 (b) 0.0188" you 
) 0.1416 (b) 0.5999 (c} 14.96 m (d) 0.3043 

12. (a) 0.798 {b) 0.323 (c) 0.132 (d) 0.228 

) 0.252 (b) 0.0581 (c) 0.104 


WPNDAHRWYE 


Mixed test 8A (page 419) 


1. (a) S~N(600, 105.8), 0.0724 (b) 0.8392 
{c) 0.1606 (d) 30.54 

2. (a) 0.733 (b} 0.984 

3. (b) 0.0802 (c) 0.6729 


Mixed test 8B (page 420) 
1. (a) 0.127. (b) G@ 0.0016 (ii) 0 (c) 0.1003 


2, (a) 0.8413 (b) 0.5  (c) 0.4207; 0.9938 
3. 0.84 


Chapter 9 


Exercise 9a Sampling methods (page 430) 


2. {a} 6, 6, 6, 6, 6, 5, 5 
4. (b} large : medium ; small = 15:25:20 


Exercise 9b Simulating random samples from 
given distributions (page 435) 


Some answers depend on the random numbers used and on the 

method of allocation. These are possible answers. 

10. (a) 1,1,1,0,3 (b} 4 

11. 33.134, 33.193, 28.712 

12. {a} 3,5 (b) 1,5 (c) 1007.2, 1016.8 

13. 1.52 

14, means of sample means ~ distribution mean; variance of 
sample means ~ } variance of distribution 

15. {a} 4 (b} 6.1826 
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Exercise 9c The distribution of the sample 
mean, X (page 443) 


0.0176 


. (a) 0.6234 (b) Approx. 4 
. fa) 0.1056 (b) 0.3092 


= 2.88 
= — 0.7975 
(a) X ~ NU4.8, | (b) 


. a) 8 (b) no 
. (a) 0.2399 (b) 0.0787 (c) 0.0127 (d) n= 109 


0.9212, 
62 


. (a) 42 (b) 60 


5 


. 20,3 
. (a) 12 (b) 20 

. 20500, 1768; no 

, 0.332, 0.0587, 0.009 

. 0.4948, 0.4944, 0.1211 


(a) P(X =0)=4, W(X =1)=4, PX =2)= 4 
(b) % (c) 0.159 


Exercise 9d Distribution of sample proportions 
(large samples) (page 447) 


1. 


(a) 0.0745 (b) 0.0037 


2. (a) 0.0057 (b) 0.527 (c) 0.1265 


NAUEYW 


. 0.0471 

. (a) 0.0648 (b) 0.0970 

. 0.7181 

. (a) 0.0648 (b) 0.0851 (c) 0.3068 
. (a) 0.22 


Exercise 9e Point estimates and confidence 
intervals for 4 (page 460) 


ve 
2h 


NAR ELS 


. 25.3, 3. 


236, 7.58 

{a) 48.875, 6.98 (b) 1.69, 8 x 10° (1 s.£.) 
{c) 22.79, 1.81 

(d) 15, 43.14 (c) 10,341 ( 9.71, 621.12 


. 0.5, 1.428 
- 205.16, 9.223 


(a) (139.16, 140.5) (b) random sample 
) (10.75, 14.15) (b) 3.4 

{a} (448.7, 467.3) 

(b) The probability that this interval includes 4 is 0.99. 

(c) No, z value less _ 

(a) (79.19, 84.81) (b) (78,89, 85.11) 

(c) No, the central limit theorem can be used, since # is 

large. 


. (68.0, 70.0), random sample, central limit theorem can be 


applied. 


. (a) 3.612 (b) (747.3 g, 748.7 g) 


( : 
{c} random sample, central limit theorem can be applied. 
( 


. (a) (1011, 1114) (b) 36 


. (a) 5.06 g (b) 89% 


. Histogram: frequency densities 1.2, 3.6, 6.4, 11.4, 20.4, 
10.2, 5, 1.8; 91,32, 7.42, 0.43, (90.5, 92.2) 

» (24.9, 25.8) 

. Histogram: frequency densities 0.8, 0.48, 0.3, 0.18, 0.1, 
0.05, 0.04, 0.03, 0.02; 194, 176, (173.5, 214.5) 


Exercise 9f Confidence intervals — smail 
samples (z — distribution) (page 468) 


1. 


(a) (177.21 cm, 182.42 em) (b) 4.91. cm 


2. {a) (3,59, 4.68) (b) 0.146 


3 
4, 
5. 
6 
7 


8.07, 9.13) 
32.08, 33.22}, 380 


14.98 g, 15.78 g) 
‘9.804, 9.808) 


( 
( 
{a) 5.13. (b) 0.588 (c) (4.70, 5.56) 
( 
( 


Exercise 9g Confidence intervals for p 
(page 471) 


\o 90 


(a) (0.622, 0.738) - 
(b) The normal approximation to the binomial has been 
used in the underlying distribution. 


. (a) (0.293, 0.427) (b) (0.273, 0.447) 

. (a) (0.238, 0.362) (b) 90 

« (a) 0.28  {b) (0.176, 0.384) 

. (0,156, 0.344) 

» fa) (0.223, 0.352) (b) wider : 

. (a) Random sample (0.244, 0.283) {ii} 90 approximately 


(b) (i) 0.26 


. (a) (0.351, 0.369) (b) 5277 


(0.509, 0.547) 


Miscellaneous exercise 9h (page 478) 


NDUSONS 


ee 


10. 


( 
( 
» ¢ 
( 
( 


. (124,34, 125.60), 4 

. (£93.59, £101.48) 

. 1,13, 0.0603, ($1.07, £1.19) 

| 9.71, (172.3, 173.3) 

. (a) 3, (2.04, 3.96) (b) 30%, (25.2%, 34.8%) 

. 0.059, 0.61 

. (a) Lifetime of bulb follows a norma! distribution; the 


items in the box constitute a random sample. 
(b) (1774 hours, 1798 hours) 
a) 268 (b) smaller, critical z value less 


( 
. (0.139, 0.315); there is a 1% chance that the interval has 


not trapped ft. : 

a) 26.525, 1.24 (b) (26.20, 26.85) (c) justified 

d) 2 large, use Central Limit theorem 

‘a) (28.98 cm, 29.42 cm) (b) Large sample 

c) X normally distributed, random sample 

d) (26.78 em, 31.62 cm) 

(e) no; 30.5 out of range of 95% confidence interval for 


. (92.32, 99.68) 

» (a) (202.4, 207.4) (b) 0.2, (0.057, 0.343) 

| (0,123, 0.392}, (170.84, 178.16), (165.57, 186.83) 
. 25.35, 0.13, (25.15, 25.6}, valid 

. (a) (0.303, 0.357) 


(c) 10% probability that interval did not trap 45 people 
changed their minds at the last minute 


. (£35.60, £130.80) 
. (35.03 mg, 35.31 mg) 
. (13.10 mm, 14.72 mm) 


(47.02 em, 51.38 cm) 


. (0.0825 mm, 0.242 mm) 


Mixed test 9A (page 481) 


1, 
2. 
3. 


4, 


{a) 0.391 {b) 93% 

14 . 
(0.23, 0.35); the norma! approximation to the binomial 
has been used in the underlying theory; only cars in the car 
park were sampled which may not constitute a random 
sample. 

(18.51, 19.49} 


Mixed test 9B (page 482) 


1. 


(b) (92.01, 93.19) 
(c} Central Limit theorem can be applied. 


2. (0.35, 0.49), 0.14 


3. {a} [%- E+ 


38.64 38.64 


b) 6000 
i e)® 


4. (a) (244.2 g, 250.22) (b) 6.0g (c} smaller 
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Exercise 10a Testing p in a binomial 


di 


1, 
2. 


oY 
4. 


istribution (small samples) (page 494) 


Hy: p = 0.7, Hy: p > 0.7; no evidence 

{a} Hg: p= 1/6, Hy: p > 1/6 

(b} There is no evidence that die is biased in favour of 4. 
{a) Do not reject Hy {b) Reject Hy 

(a) Evidence to suggest decrease. 

{b) No evidence to suggest decrease. 


» (a) x>S 


(b) The probability that Hy is rejected when it is in fact 
true. {c}) 0.1 


- (a) AcceptHy {b) Reject Hy) (c) Reject Hy 


(d) Accept Hy (ce) AcceptHy (f) Reject Hy 
(g) Reject Hy (h) Accept Hy 


- {a) Driving instructor is over-estimating pass rate. 


{b) x>3 


» She could have been guessing. 

. (a) x<2 (b) 0.803 

. (a) 15% (b) 0.15(09) (c) 28% (2 s.£.) 
» (a) 7.5% (2 8.6.) 


{b) same as significance level 


{c) 66% (2 s.£.) 


Exercise 10b Testing 4 in a Poisson 
distribution (page 500) 


1. 
2. 
3. 
4. 
3. 


6. 


Increased 

(a) Not increased {b) Decreased 

Hg: A= 9, H,: 4 > 9, not increased 

{a) 0.0424 (b) 0.849 

(a) AcceptH, {(b) AcceptH, (c) Accept Hy 
{d) Reject Hy {e} Accept Hy (f) Reject Hy 
Hy: 4 =3.5, Hy: 2 > 3.5, not increased 


Miscellaneous exercise 10c Binomial and 
Poisson tests (page 504) 


1. 


» (a) Hep 


(a) 0,028 (b) 0.131 
(c) 0.261; Hy: p = 0.6, H,: p > 0.6; teacher is not 
underestimating 


~ {a} 0.552 (b) 6, 0.296 


{c) The probability he scores a penalty kick remains 
constant at 0.7. 

(d) Hy: p =0.7, Hy: p>0.7 

(e) No evidence of improvement {£} strengthened 


- Manufacturer’s claim is not accepted; discrete distribution, 


P(X < 12) 


6%, P(X < 13) = 17.1%. 
Hy: p> 0.2 
(b) X~B(25,0.2) (c) 9 


+ (a) (i) 0.0278 (ii) 0.0384 iii) 0.0768 


{b) Hy: p= 0.5, Hy:  +0.5, no indication of whether 
too! ng for evidence of more males or more fernales. 
{c) Evidence of more males than females, x > 13 


- (a) 37% (b) 42% 


{c)_ i) The consumer group has used a high value for the 
significance, 
(ii) Choose 5% or 10% significance level to maintain 
credibility. 


} 2,148 (b) 0.302 

) Hy p= 0.2, Hy: p <0.2, not teduced 

} reduced 

)} 0.430 {b) 0.962 (c) 0.00459 

) Hy p= 0.9, Hy: p< 0.9, looking for a decrease 

) No evidence that service has deteriorated, 

f) x< 12; P(X < 12) <0.05, whereas P(X < 13) > 0,05 

8) Defects occur randomly and independently, with 
two defects at the same spot, 

(b) (i} 0.209 (ii) 0.221 

(c) 0.140 

(d) Hy: 4 = 2.4, H,: 4 > 2.4, evidence that number of 

defects has increased. 


, With no 


10. {a) (i) 0.181 {ii) 0.999 (b) 0.018 


1. 


M 
1 


2. 


{c) No evidence of decrease. 
0.0057, 9 mins, not significant 


ixed test 10A (Binomial) (page 506) 

. 0,1, 9,10 

(a) Hy: p = 0.15, H,: p < 0.15, evidence that new 
procedure has been successful. 

(b) Staff making an effort during the first week, take 
sample over a longer period of time. 


3. No evidence to support gardener’s claim. 


Mixed test 10B (Poisson) (page 506) 


1. {a) Poisson, 2.1 
{b} {i) 0.650 (ii) 0.222 
(c) Evidence suggests higher rate. 
2. {a) (i) 0.138 (ii) 0.847 
{b) Hy: A=7.5, Hy: <7.5, does not provide significant 
evidence. 
3. (a) Nominally 5% (between 4.26% and 8.39%) 
(b) 76% (25.6). 
Chapter 11 


lar; 


Exercise 11a z-tests for a normal population or 


‘ge sample size (page 522) 


» (a) z= -1.095, accept Hy (b) z= 1.845, reject Hy 


(c) 5S, reject H, (d) z=-2.778, reject Hy 
z= -0.943, no 


. It could be 103.5 
. 22.487, yes 
. &= 1.909, distribution of the sample mean is approxi- 


mately normal. 


<= 1,987, no evidence 
» (a) X% < 91.5065 minutes (b} 0.0093 {c) 0.3286 
z= 0.983, accept mean is zero 
. 3.778 <% < 6.222 
. (a) <= 1.778, accept Hy (b) z= 1.778, reject Hy 
(c) z=-1.428, reject Hy (d) z=~2.487 accept Hy 
» (a) Reject Hy and conclude mean is not 52. (b) 0.04 
} 0.0817 (b) 0.665 
= 2,946, yes 


. (b) 0.24 (c) 4>389.7 (d) 0.0494 


Exercise 11b #-tests for a normal population, 
small sample size (page 527) 


1, 


{a) 2=0,909, accept Hy (b) ¢=-1.89, accept Hy 
(c}) £22.15, reject Hy (d) £=-3.07, accept Hy 
= 2.828, evidence of improved times 

{a) £=—-3.54, underweight (b) z=—3.2, underweight 
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4, t=-1.1, no 
S. t= 2.284, mean greater than 4.3 
6. (a) t=-3.23,no (b) (1.69, 2.88) 
7. (a) z= -1.66, no change in mean 
(b) 0.324, t = -2.33, change in mean 
8. X is normally distributed, ¢ = 1.80, accept null hypothesis 
9, Hg w= 27, Hy: 4 +27, t= 2.9, mean is 27 
10. Ho: w= 50, Hy: ¢ < 50, t = -0,435, not overstating 


Exercise 11c Testing a binomial proportion 
large 1 (page 532) 


1. (a) z= 1,59, accept Hy (b) z= 2.206, accept Hy 
{c) z=-1.79, accept Hy (d) x= 2.118, accept Hy 
{e) z= 2,937, reject Hy 
. 2=—2.40, do not accept claim as there is evidence that 
proportion is less, 
. 25 1.637, yes 
z= 1.476, no 
g=~1.990, no 
221.5, no 
. 2= 1.705, evidence that more than 65% own a mobile 
phone. 
8. (a) (i) 0.0297 (ii) 0.0934 
(b) z= —1.792, germination rate less than 75% (only just 
— do further tests) 
9. z= 2.43, yes 
10. Replies were representative of the population. 
(a) z= 1.220, no evidence to suggest proportion in favour 
is more than 0.7. 
{b) (0.681, 0.808) 
11. (a) Evidence that proportion is lower 
(b) No different 
12. (a) z=~3.03, evidence that p < 0.4 
(b) (0.379, 0.458); 75 
13. ¢=-1.267, no 
14, ¢=—2.44, evidence that proportion has fallen 


N 
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Exercise 11d testing the difference between 
means of two normal populations 


Section A: z-tests (page 543) 


1. (a) (i) z=-2.096, reject Hy (ii) z= -1.402, accept Hy 
(iii) z= 2.493, reject Hy 
{b) {i) z=1.99, accept Hy {ii) z= 2.076, reject Hp 
(iii) z=-2.036, accept Hy (iv) z= 1.783, reject Hp 
(v) 2= 1.779, reject Hy (vi) z= ~-2.321, accept Hg 
{only just) (vii) z= 2.55, reject Hg 
. 0.567, z= -2.219, flowers on sunny side grow taller 
. 2= 3.52, second population has smaller mean than first 
. = 2.036, significant at 5% level, not significant at 4% 
level 
. 4.41, (9.87, 10.73), 3.61, z= 1.49, not significant evidence 
z=~1.646, reject Mr Brown’s claim {only just) 
z=-2.04, evidence of difference 
1.15, 2=-2.913, significant evidence 
2= 1.627, accept; 124 
). 27.33 (26.77, 27.89), 2.4, z= 1.97, those of higher 
intelligence do not have greater foot length. 


AON 


i 


= 


Section B: z-tests (page 545) 
1. (i) (a) 17.73 (b) £= 2.135, reject Hy 

i) (a) 87.09 (b) £=-0.567, accept Hy 
{iii) (a) 27.5625 (b) ¢= 2.088, accept Hy 
fiv) ( 


iv) (a) 4.182 (b) #= 1.260, accept Hy 


t= —1.13, no evidence that Welsh policemen are shorter 
than Scottish policemen. 
. 196, z=-1.714, do not differ significantly 
. (a) 10.8125 (b) #=-1.282, accept claim 
. £= 2.423, not significant difference 
. Normal populations with common variance, t = 2.36, 
evidence that mean has increased; t= 2.041, the mean 
could be 500 g. 
7. (a) Normal! populations with common variance 
(b) Ho: 24 =42, Hy 4) #2 (c) t= —0.942, same 
8. t= 1.868, evidence that new method has led to higher 
scores; (~2.60, 33.9) 


is 


Anhw 


Miscellaneous exercise 11e (page 554) 


1 (17.1, 19.7), there is a 10% chance that it hasn’t trapped 
yu: = 0.759, w could be 17.8 
2. (a) Children within families selected are representative of 
all children. 
(b) z=0.939, data do not indicate that boys and girls are 
not equally likely 
3. (a) Hg: =30, Hy z>30 (b) % > 33.95 
{c) Evidence that mean speed is greater than 39 mph (X is 
in critical region). 
{d) 0.9941 
4. (a) Ho: = 43, Hy: > 43 
(b) Since 7 is large, the distribution of the sample means is 
approximately normal 
(c) z= 1.768, mean amount has increased 
(d) (43.35, 52.65}, consistent, 43 out of range of 
confidence interval 
5. Hg: p = 0,13, Hy: p < 0.13, 2%, 0.161 
6. (a) (i) P(X > critical value | = 65) 
(ii) P(X < critical value | 41 is value specified by the 
alternative hypothesis) 
(c) Accept Ho, Type II 
(d} 0.0059, Type II error would be less and tends to zero 
as jt increases 
7. Not representative as it excludes people at work, school, 
etc; better to take random samples at random times during 
the day for a spread of days, (68%, 80%). z = —2.03, data 
provide significant evidence 
o o 
8. (a) % > pg t+ 1.96 — or ® < ptg— 1.96 — 
(a) Hy ae Ho ve 
o 
bb} % > pty - 2.326 = 
(b} Ho Vn 
9. (a) 0.422 
(b) E(unbiased estimate) = true value; batch not rejected, 
9.6% 
10. (a) Hy p>s (b) N50 (c) 0.059 ~6% 
11. Accept as slow if mean bounce <11.645, 0.0004 
12. (a) (i) 10.46 (ii) 15.64, E(unbiased estimate) = true value 
(b) 1 
(c) Central Limit theorem holds when 7 is large 
13. <= 3.367, accept claim that mean duration is more than 
42 months; # large, use Central Limit theorem 
14. (a) 75 
(b) z= 2.19, machine is not correctly calibrated 
{c) Unbiased estimate of standard deviation used, 
distribution of sample mean approximately normal; 
(0.316, 5.684) 
(d} Smaller, might fead to result chat machine is correctly 
calibrated. 
1S. (a) 66.25, 133.40 (b) Hy: # = 62.5, Hy > 62.5, 
z= 1.465, no evidence of increase 
16. {a) z= 2.475, mean has increased 


17. (a) Ho: # = 1.73, H. :u@>173 (b) X~NO 
() 2>1777° * » is itl 
rr i) men who play basketbail are not taller {e) 0.14 


Test 11A (z-tests) (page 558) 
1. (a) 21.25 
(b) z=0.99, no evidence to support manufacturer’s 
suspicion 
(c)_ obtaining distribution of X, distribution of X not 
known 
22s 1.567, not sufficient evidence to say that the quoted 
figure is an underestimate 
3. (a) (i) P(reject Hy when Hy is true) 
(ii) Placcept Hy when H, is true) 
(b) (i) z= 2.372, mean is greater than 17.5 
(ii) ¥ > 18.09 
{iii} 0.639 
4. 0.0606 (~6%), 0.1118 


Test 11B (z-tests) (page 558) 


1, Ag @= 125, Ape < 125, z=-1.549, no evidence that ut is 
lower 
2. z= 2.318, government spokesman 
3. (a) ¥ < $9.82 
(b) It is accepted that the mean is 60 when in fact it is an 
alternative value (less than 60). 
{c) 0.057 
4. (a) Hoty -4, =0, Hy: 4-4, #0 
(b) z is 1.6, no difference 
{c) Distribution likely to be skewed rather than symmetric 


Test 11C (t-tests) (page 559) 


1. t=~3.560, evidence that mean falls below $7.40; normal 
ribution ; 
2, t=~2,915, San Marco cooler 
3. (a) 4.238 (b) Normal distribution, ¢ = 2.857, yes 
4 {c} apes z-test not étest ‘ 
. t=—2,046, new score higher; (— 
emer hein igher; (—6.948, 32.282) or 
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There will be variation in answers, depending on the degree of 
accuracy used in various stages of the working, 


Exercise 12a Goodness of fit test — uniform 
and given ratio (page 569) 


1. X*=1.93, 05 3, die is fair 

2. X*=18.16, v= 9, uniform distribution 

3. X?=6.19, »=2, yes 

4. X?=4.95,¥=3, no; X?= 9.90, v = 3, yes 

5. X?=8.24, v=7, accept theory 

6. X?=4.15,0=4, yes 

7. X*= 10.68, 0=4, no 

8. 15.5 

9. 78.81, 17.8, 7.8, 6.7 X? = 5.92, v= 3, no di 
10, X? = 38.2, v= 9, evidence of bias alee 
11. X?=10, y= 4, not uniform 
12. X?=4.4, y= 5, die is fair 


13. (a) modal class 2 to <4 
{b) 4 years 8 months, 3 years 2 months 
{c) For cumulative frequency curve 
plot (0, 0), (2, 42), 
(4, 94), (6, 122), (8, 142), (10, 160), (12, 176); 3 years 
9 months, 4 years 9 months 
(d) X?=5.73, v= S, justified 


Exercise 12b Goodness of fit tests binomi 
€ - m 
Poisson and normal distributions (page ae 


1. Combine last three classes, X? = 4,09, v= 3. 
Peers v= 3, accept 
2. X ~ BOS, ), B= 80.5, 80.5, 32, 7 (last 3 classes combined) 
X? = 8.21, v= 3, biased; ¥=1, p= 0.2, X~B(5, 0.2), 
E= 66, 82, 41, 11 {last three classes combined) X? is very 
small, v = 2, too good a fit, query data. 
3. mp, 1.6, 0.32, B= 7.3, 17.1, 16.1, 7.5, 1.8, 0.2 (combine 
last 3 classes) X? = 1.79, p = 2, good fit 
4. X ~ BQ, ), E= 150, 60, 6, X?=9.6, v= 2, rejects use 
%= 0,444, p = 0.222, find E, v=1 
5. (a) ¥=2, p=0.4, E= 6, 21, 28, 18, 7 (combine last 
2 classes) 
(b) X?=2.21,v=3 (c) yes, binomial ade 
i; A quate 
6. (a) X~ B(5, 0.2088), E= 155, 205, 108, 28, 4, 0 
{combine last 3 classes) 
Es a x = 5.959, v= 2, binomial (but only just) 
{b) 2 = 20, p = 0.35; 0.16135, 8.1 
(c) 12.3 
(d) B= 123, 8.6, 9.2, 8.1, 11.8, O = 9,7, 17, 8,9, 
x =8.46 (e) 3, not good fic at 5% level 
) E = 246.6, 345.2, 241.7, 112.8, 39.5, 14,2 
F ) ge ; 32.2, v = S, not accepted 
» X= 2,5, E= 8, 21, 26, 21, 13, 11 (combi 
Wee a. se ( ine end classes), 
10. ¥= 1.28, B= 41, 52, 34, 14, 6 {combine end 
7 » 52, 34, 14, classes), 
4 ea v = 3, not significant 
» X= 0,65, E= 20.88, 13.57, 5.55 (combine end el 
Es X?=1.85, v=1, accept : anes 
- (a) #=1.2,E=99, 119, 72, 29, 9, 2 (combine end cl 
(b) X? = 0.48, v= 3, very good fit ce 
13. ¥=0.9, E= 21, 18, 11 (combine last 3 classes), X? = 1.80, 
v=, yes, consistent : 
14, (b) E=7.3, 12.4, 10.6, 9.7 
(c) X?=177, y= 2, reasonable 
7 {d) very low, suspicious 
- E= 6,68, 9.19, 14.98, 19,15, 19.15, 14.98, 9.19, 6.6! 
X? = 3,197, v= 7, accept, eee 
If, o? unknown, v= 5 
16. (a) E= 3, 13, 28, 32, 18, 6 (combine first 2 clas: 
X=119, v=4, reject cet 
{b) ¥=171.54, s= 7.11, E=6, 18, 32, 28, 13, 3 (combine 
last 2. classes), X? = 1.73, v= 2, accept normal 
17. (a) ¥=1.732, 6=0,216 G d.p.), E = 7.78, 26.05, 44.12, 
33.64, 13.41, X? = 8.96, v=2 , : 
(b) X?=2.42,0=1 


Exercise 12c Contingency tables (page 588) 


1. E=48, 10.67, 21.33, 42, 9.33, 18.67, X?= 1,037, y=2 
no difference ; ° 
2. E=25.5, 25.5, 60.5, 60.5, 26.5, 26.5, 7.5, 7.5, X?=2,03 
v = 3, independent ; 
- E= 50.1, 29.5, 23.4, 22.9, 13.5, 10.6, X?=4,00 
ry 4, 22.9, 13, 5 = 4.00, v= 2, 
» E=65.1, 28.9, 58.9, 26.1, X? 7.43, 0=1, yes” ai 
- E=27.5, 972.8, 27.5, 972.5, X= 4.79, y= 1, yes 


ow 
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6. R= 11.4, 14.3, 8.6, 15,7, 18.3, 22.9, 13.7, 25.1, 20.6, 
25.7, 15.4, 28.3, 29.7, 37.1, 22.3, 40.9, X?=12.0,0=9, 
accept 

J. E=66.7, 33.3, 53.3, 26.7, X? = 6.81, v= 1, no 

8. E=21,0, 10.0, 7.0, 15.5, 7.5, 5, 41.5, 19.5, 14, X? =7.86, 
v= 4, accept 

9, B= 34.2, 29.8, 12.8, 11.2, X?= 1.22, v=1, no 

40. E= 17.5, 82.5, 17.5, 82.5, X? = 0.58, v= 1, no 

11, B= 202.2, 260.7, 318.1, 184.8, 238.3, 290.9, X? = 2.02, 
v= 2, independent 

12. v= 2, X?=5,99 

13. (a) v=(3-1)3-1)=4 (b) difference 

14. E= 13.5, 15.5, 21, 8.64, 9.92, 13.44, 31.86, 36.58, 49.56, 
X? = 11.35, v=4, yes 

18. E= 33.6, 22.4, 63.6, 42.4, 22.8, 15.2, X?=4,775, v=2, 
no difference 

16. E= 90.405, 56.595, 35.595, 20.405, X? = 13.3, v=1, 
related 


Miscellaneous exercise 12d (page 594) 


1. Ho: Preference for proposed route is independent of where 
people live (no association between them) H,: There is an 
association between them E = 47, 28, 31,33, 18.67, 15.67, 
9.33, X? = 1.479, v= 2, no association 

2. (a) Hg: Occurrence of shoplifting is uniformly distributed 

between the months, H,: Shoplifting is more likely to 
occur in some months than others. 

B= 14.5 (all classes), X? = 14.268, v= 11, no associa- 
tion 

3. Hy: No association between reaction and eye colour, 

Hy: There is an association between them, 

E= 15,675, 7.425, 9.9, 26.125, 12.375, 16.5, 15.2, 7.2, 
9.6, X? = 20,9, v =4, association between reaction and eye 
colour 

4, B= 6.3 (combines first 4 classes), 8.85, 14.66, 18.99, 
19.28, 15.32, 9.52, 7.05 (combine last 3 classes), 

X? =4,908, v = 6, good fit 
5. (a) Hg: No association between brand of fertiliser and 
yield, H,: There is an association between them, 
E= 10, 12, 8, 8, 9.6, 6.4, 7, 8.4, 5.6, X= 7.811, 
uv =4, no association 
{b) v=2, there is an association between choice of 
company and yield 
(c) Quickgrow 
6. (a) Hg Peak flow measurements are normally distributed 
(with mean and variance as estimated from the data), 
accept 
{b) Expected frequency must be greater than 5, combine 
classes 

a) E=18.33, 20,67, 20.68, 23.32, 7.99, 9.01 

(b) X?= 6.88, v= 2, there is an association 

8. (a) Ho: No association between candidates’ grades in 
mathematics and physics, H,: There is an association 
between them, E = 17.2, 13.8, 12.8, 10.2, X? = 5,672, 
v=1, there is an association 

(b) Expected frequency might drop below 5 

9, (a) E= 44,62, 66.94, 50.2, 25.12, 13.12 (combine last 

two}, X? = 10.6, v =4, at 5% level, no 

(b) Use mean from data for A, v=n-2=3 

(c) Do not have independent events with a constant 
probability of success. 

10. B= 644 (all classes), X? = 10.95, v= 4, data do not 
support claim, random events, X = 1.1, E = 33.3, 36.6, 
20.1, 10 {combining last 2 classes), X?=0.48, v=2, 
accept 


14. 


12. 


13. 


14. 


15. 


16 


(a) Ho: Grades are in the ratio 15 : 20:35:25: S,Hy: 
Grades are not in this ratio B = 30, 40, 70, 50, 10, 
X?=7.074, v =4, same proportion 

{b) Ho: Sex and grade are not associated, H,: There is an 
association between them, v = 4, there is an 
association 

{a) %=0.74, combine 4 and over, E = 667.96, 494.29, 
182.89, 45.11, 9.75, X* = 108.87, v = 3, not adequate 

(b) Not consistent, Poisson mode} was not adequate 

{c) E=259 (all classes), X* = 13.8, v = 3, not consistent 

(a) E= 19.33, 15.33, 10, 15.33, 9.67, 7.67, 5, 7.67, 

X? = 12.08, v =3, mark is associated with type of 
question 

{b) Poisson, this is most similar question 

(c) E=22.5, 22.5, 22.5, 22.5, 7.5, 7-5, 75s 75s 
X?=17.6, v= 3, yes it is 

(d) Contingency table — popular and well answered; 
Binomial and Poisson fits — average popularity, 
relatively badly answered, normal fit — unpopular but 
well answered by those who attempted it. 

(a) E=480 (all classes), X? = 14.8, v = 4, there is evidence 

(b) B= 6.405, 6.51, 8.085, 24.705, 25.11, 31.185, 29.89, 

30.38, 37.73, X* = 16.9, v= 4, length of employment 
is associated with grade 

E = 38.78, 34.02, 18.2, 39.21, 34.39, 18.4, 27.28, 23.92, 

12.8, 22.59, 19.81, 10.6, 22.59, 19.81, 10.6, 28.55, 

25.05, 13.4, X? = 16.0, v = 10, no association, expected 

frequency must be greater than S, would not make sense; 

E = 6({all classes), X? = 13.2, v = 6, reject hypothesis 

{a) x= 1, E= 36.79, 36.79, 18.39, 8.026, (3 or more}, 
X?=12.9, v=2, not suitable 
(b) B=26.25, 6.75, 8.75, 2.25, X? = 3.77, v= 1, yes 


Mixed test 12A (page 598) 


1 


(a) 1.04 (b) Calls occur at random 

{c) E=58.01, 55.11, 26.18, 10.70, X*=4,86, v= 3, 
Po{0.95) is suitable 

E=6, 20, 12, 2, 9, 30, 18, 3, X*= 13.11, v=3, there is a 

link between General Studies performance and degree class 

E=15.9, 21.1, 26.1, 21.1, 15.9 (combine first 2 and last 2 

classes), X? = 7.08, v= 4, N(180, 9) is suitable 

He: There is no association between gender and passing a 

driving test. 

Hy: There is an association. 

E= 27.5, 22.5, 27.5, 22.5, X* = 2.585, v=1, results do 

not indicate link 


Mixed test 12B (page 599) 


1. 


2. 


3. 


E= 32 {all classes), X? = 1.6875, v = 3, no particular 

preference, data cannot be used to discredit claim 

E=21.34, 43.66, 21.66, 44.34, X2= 4.72, v= 1, there is 

an association between the two factors 

E= 16.67, 16.67, 16.67, 16.67, 33.33, 50, X? =4.65, 

v= 5, die is biased in the way described 

(b) Random positions (c) 37.24, 2.50 

(d) Hg: The distribution can be modelled by Po(2.59), Hy: 
The distribution cannot be modelled by Po(2.59), 
X?=7.55, v= 5, Poisson model is supported by data 


Chapter 13 


Exercise 13a Significance test for 
product — moment correlation coefficient 


{page 604) 
1, Reject a, c, f, g, h: do not reject b, d, 
2. {a) 0.3755 
(b) Hy: ep =0, Ay > 0, reject Hy, no evidence 
{c) X and Tare jointly normally distributed with 
correlation coefficient p and that the data constitute 
a random sample from all values of x and ¢. 
3. (a) Scatter diagram : 
(b) 0.834 
{c) reasonable 
(d} Studene’s view is wrong; correlation does not imply 
causation; in this case there may be a common 
F underlying cause such as wealth. 


fa) -0.690, reject Hy in favour of Hy 
(b) 0.686, reject Hy in favour of H, 


Exercise 13b Significance test for S| y 
rank correlation coefficient (page 607, 


1. 
2. 
3. 


4. 
JS. 
6. 


Rie b, f, bs do not reject a, c, d, g, i, © 

.714, no evidence of agrcement (only j 

aoe (only just) 

{b) Hg: p,=0, Hy: p, > 0, do not reject Hp, no evidence of 
agreement between the interviewers 

0.745, evidence of correlation 

(a) 0,66 

{b) evidence of positive correlation 

0.4286, Hy: p, = 0; no evidence of positive correlation 


Miscellaneous exercise 13c (page 612) 
1. (a) 0.636 


(b) Hy p,=0, Hy: p, > 0. Accept Hg, no evidence of 


positive correlation. 


2. (b) 0.916 


(c) Evidence of positive correlation between the number 


of wren territories recorded and the number of adult 
wrens trapped. 


7. (a) c= 2.48 +0.607m 


8. 0.690, Het p, =0, Hy, 
, Hy: p, 
9. (b) 0.783 : 
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3. (b) - 0.975 (c) stron, i 
5 (6 & Negative correlati 
(d) There is evidence of correlation eos 
sunshine and temperature, ent 
4. (a) 0.4286 
(b) Hy: p,=0, H,; i 
4 P; #0, no evidence of i 
P between attendance and position inthe ae 
+ 0.527, no evidence of agreement os 
6. (b) 0.535 
ae some positive correlation 
Low mark in x, high mark j 
{e) 0.794 ee ey 
(f) Ho: p,=0, Hy: 2. > 0, evidence of positive correlation 
(b) 0.593 
{c} 11.4 (d) 0.516, no (ce) r 


#0, no evidence of correlation 


(c) Hy: p=0, Hy p > 0, evidence of positive correlation 
(c) Data constitute a random sample of afl values of x 


and y, years selected may no be representative, 
(d) lower : 


- 0.825, 0.929, evidence of positive correlation (1% level) 
+ 0.3341, not significant (5%), —0.6939, significant (2.5%) 


Mixed test 13A Correlati ici: 
Gases) relation coefficients 


(a) 0.473 (b) evidence 
~ (a) 0.667 
(b) Hg: p,=0, Hy: p, > 0, judges in broad overall 
agreement 
(c) evidence of correlation 
(d) Spearman’s rank 


- += 0,310, not significant, no evidence of possible 


correlation 


. (a) 0.619 


(b) Ho: p, = 0, Hy: p, > 0, not evidence of positive 
correlation (just) 


(c) Two very different sets of data being compared 
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normal approximation to 
Poisson approximation to 
significance (hypothesis) test, # large 
7” smal 
box and whisker diagram (box plot) 
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central limit theorem 
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contingency tables 
continuity correction, normal to binomial 
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Yates’ (y? test) 
continuous data 
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Spearman’s rank 
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Spearman’s rank 
tables 


use of 


485, 509 


183 
485, S11, 566, 583, 604 
186 


483-492 


561, 579, 585 
5 


5. 
450-457, 462 


652 
602, 605 


critical region 


iti 485, 509 
critical value 485, 509 
cumulative distribution function, continous "334 
discrete 253 
uniform distribution 348 
cumulative frequency curve, polygon 61 
percentage diagrams 65 
step diagram So 
cumulative probability tables, binomial 645 
Poisson, 647 
data, continuous 2 
discrete za 


grouping 
degrees of freedom, x distribution 
t-distribution 


dependent variable 121 
difference between means hypothesis test 534 
difference between random variables 257 
normal variables 407 
distribution, frequency 2,3,9 
robability 233 
nction (cumulative) 334 
of sample mean 436 
of sample proportion 445 
shape 20 
equally likely events 171 
errors, type Land pe Ih 493, 520 
estimation of population parameters 4 
exhaustive events 180 
expectation, continuous random variable 320 
liscrete random variable 237 
experimental probability 169 
frequency density 12 
frequency distribution 9 
curve, polygon 17,19 
geometric distribution 271 
jiagrammatic representation 272 
expectation and variance 275 
mode 273 
progression (use in probability) 205 
histogram 11 


hypotheses, alternative, null 
hypothesis tests 


485, 511, 566, 583, 601 
483, 507, 560, 600 


independence (7? test for) 582 
independent events 185 
variable 121 
interpercentile range 68 
interquartile range 68 
interval estimation (confidence intervals) 449 
width A4S7 
least squares regression lines 121 
level of significance 485, 509 
linear combination of normal variables 403 
of random variables 336 
linear correlation 119 
linear interpolation for median, quartiles 78 
lower quartile 69, 71, 75 
continuous random variable 336 
mean, use of calculator 31 
confidence interval of 450 
discrete data 28 
distribution of sample B36 
frequency distribution 30 
hypothesis test, difference between means 534 
mean 514-520, 524 
Poisson mean 496 
unbiased estimate 447 
weighted 36 
median, data 69 
continuous random variable 335 


linear interpolation 78 
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mid-interval value 30 
modal class 12 
mode, raw data 2, 
continuous random variable 329 
multiple of random variables 246, 250 
normal variables 409 
multiplication law (probability) 186 
mutually exclusive events 179 
negative correlation 119 
negative skew 84, 95 
non-parametric test 605 
normal approximation to binomial 382 
to Poisson 390 
distribation 89, 360 
goodness of fit test (4?) 576 
tables (standard normal) 649 
use of 362-377 
aull hypothesis (Hy) 485, 511, 566, 583, 601 
one-tailed tests 489, 511 
or rule, probability 183 
outlier 98 
Pearson’s coefficient of skewness 85 
percentile 68 
Permutations 214 
pie diagrams 24 
Point estimates 447 
Poisson, approximation to binomial 299 
cumulative probability tables 647 
use of 294 
diagrammatic representation 295 
distribution 292 
expectation and variance 293 
fitting a theoretical distribution 296 
oodness of fit test (42) 573 
pothesis test for mean 496 
mode 296 
normal approximation to 390 
sunt of two variables 301 
unit interval 293 
pooled two-sample estimate (variance) 535 
population 421 
positive correlation 119 
Positive skew 84, 95 
possibility space 172 
Power of a test S21 
probability 168 
addition law (or rule) 183 
arrangements, permutations and combinations 206 
Bayes« theorem 197 
complementary event 172 
conditional events 182 
density function {p.d.f.), continuous 314 
(ror cumulative distribution 341 
discrete 234 
distribution 233 
exhaustive events 180 
experimental 169 
independent events 185 
multiplication law (and rule) 186 
mutually exclusive events 179 
subjective 171 
trees 193 
product-moment correlation coefficient 139 
significance of 600 
table of critical values 652 
Proportion, confidence interval 469 
distribution of sample 445 
unbiased estimate 447 
signiticance test, 7 large 528 
asmall 483 
quartile coefficient of skewness 88 
quattiles, ungrouped data 69, 714 
continuous random variable 336 
grouped data 25 
quota sampling 423 
random number table 653 
use of 425 
random sampling 424 
from frequency distribution 431 
from probability distribution 432 
random variables, continuous 314 
difference between 257 
discrete 233 


Eso seb 


a 

oa 
multiples of 246, 259 
sum of 256 
range 37 
interpercentile 68 
interquartile 68 
tank correlation 146 
rectangular (uniform) distribution, continuous 345 
mean and variance 347 
discrete 240 
regression, coefficients of 124, 142 
unction 119 
least squares lines 119 
calculator 133 
rejection criteria (rules) 513 
rejection region 485, 509 
sample mean 436 
proportion 445 
sampling distribution of means 436 
proportions 445 
sampling methods 424 
cluster 429 
design 422 
frame 429 
quota 423 
stratified 428 
systematic 427 
units 423 
scaling sets of data S1 
scatter diagram 118 
significance level 485, 509 
tests 483, 507, 560, 600 
simulating random samples 431 
skewness 84 
geste coefficient of 88 
‘earson’s coefficient of 85 
Spearman’s rank correlation coefficient 146 
significance of 605 
table of critical values 652 
standard deviation, discrete random variable 249 
calculator 40 
frequency distribution 41 
raw data 37 
standard error of mean 438 
of proportion 445 
standar normal variable 361 
cumulative tables 649 
use of 362 
stratified sampling 428 
stem and leaf diagrams (stemplot) 4 
back to back stemplot 7 
step diagrams 59 
sum of random variables 256 
normal 403 
Poisson 30% 
survey 422 
systematic sampling 427 
t-distribution 462, 
test statistic 485, 547 
tied ranks 150 
tree diagrams 193 
tables 650 
use of 464 
t-tests S24 
two-tailed tests 489, S11 
type I and type Il errors 493; 520 
unbiased estimate 447 
uniform distribution {rectangular}, continuous 345 
discrete 270 
goodness of fit test 567 
unit interval (Poisson distribution) 293 
upper quartile 69,71, 75 
continuous random variable 336 
variance, from data 38 
random variables, continuous 327 
discrete 248 
unbiased estimate 447 
Pooled from two sample 535 
Venn diagram 172, 175 
weighted mean 36 
width, confidence interval 457 
interval 3 
Yates’ continuity correction 586 


